Monte Rosa is history – the future is a universal computing platform

The Monte Rosa supercomputer has been supporting users at CSCS in their scientific research since 2009. There have been hundreds of publications based on its computing power. Now this supercomputer is being removed from the network. Monte Rosa is being replaced by a high-performance platform that marks the start of a new era for CSCS and its users.

The computing power of the Cray XE6 that came onstream in May 2009 made CSCS one of the world’s leading computer centres. Now, after about six years of service, the computer christened Monte Rosa is leaving the network. The latest flagship supercomputer, Piz Daint, began a “gradual” takeover from Monte Rosa as long ago as December 2012. Piz Daint is paving the way for a new platform. According to CSCS Director Thomas Schulthess, the core business of computing centres will develop further in the future. The focus will no longer be on the computing infrastructure, but on providing a comprehensive service platform which is able to support users in all aspects of scientific computing.

Last shutdown command for Monte Rosa is being issued.
Last shutdown command for Monte Rosa is being issued.

One machine that can do everything. That is the dream of both industry and computer centres. The latter would like to have a supercomputer that can not only  carry out computations but also process data, being capable of analysing, structuring, visualising and storing it – all in one. Until now, CSCS has operated different systems to meet these different requirements. However, with the introduction of Piz Dora – an extension to its Piz Daint flagship supercomputer – in the second half of 2014, the centre has now laid the foundations for exactly this kind of platform, with one universal supercomputer. The main purpose of the platform is now to provide services.

Succeeded by a heterogeneous all-rounder

In Piz Daint, a Cray XC30, the new platform has one of today’s most energy-efficient petaflop computers in the world at its disposal. Thanks to its hybrid system based on graphic processors (GPUs) and conventional CPUs, and with the help of special software, it is able to visually depict the results of computations even while simulations are running. By adding Piz Dora – a Cray XC40 with 1,256 compute nodes consisting solely of CPUs – to the computing system, the platform can now not only carry out conventional calculations and visualisations but also analyse and structure data. This enables it, for example, to filter out what is important from a vast volume of data – a vital function in this age of Big Data. The computing system comprising Piz Daint and Piz Dora is connected by a shared cache memory that is used while a computation is being carried out (scratch space) and by what is currently one of the most powerful networks in the world.

Piz_Daint_CSCS
Piz Daint (Cray XC30) and its extension Piz Dora (Cray XC40).

Piz Dora is now assuming the tasks performed by the former CSCS flagship supercomputer, Monte Rosa, and by the University of Zurich’s Schrödinger cluster. The University of Zurich has taken a stake in Piz Dora, bringing with it other research institutions which, for reasons of either space or efficiency, are unable or unwilling to maintain their own computing centre. Piz Dora therefore also accommodates the cluster resources of the Paul Scherrer Institute and the National Centre of Competence in Research (NCCR) MARVEL (Materials’ Revolution: Computational Design and Discovery of Novel Materials). In addition, scientists from ETH Zurich use Piz Dora as a tool for data analysis. Individual computers belonging to other institutions that are operated at CSCS are also to be integrated into the new platform in the medium term.

User-optimised infrastructure

The new infrastructure will offer more software and applications to ensure that the computing system is used as efficiently as possible. This is because computer-assisted research is increasingly attracting attention not only from traditional users in fields such as physics, chemistry, materials research and earth and climate sciences, but also from newer areas of research where users sometimes have little experience of working with supercomputers. These users are to be supported by the extended range of services in the areas of software and applications.

The aim is to offer users the flexibility they need to solve their problems without any difficulty using the most suitable computing system. That is why, when it comes to software, the crucial point is to carry on continuously expanding the platform, insists Schulthess. The CSCS Director believes that this heterogeneous platform offering comprehensive services will pave the way for closer cooperation with the providers of publicly available cloud computing solutions. After all, scientists should ideally be able to move their work as required between their local computer – for example their laptop – and publicly available cloud services or the CSCS supercomputers.

The future lies in services 

CSCS believes that the future of high-performance computing lies in the consolidation of computer infrastructure and services, on which it has already embarked. Interdisciplinary projects to improve software and applications as well as hardware have been underway since the launch of the High-Performance Computing and Networking (HPCN) initiative in 2009. These have, for example, helped to enhance computing algorithms, develop codes and optimise modern computer architectures, and they continue to do so. However, last but by no means least, the further successful expansion of the platform also depends on the industry, which in some cases is still refining the technologies required. That is why researchers from the user side and CSCS are working closely with hardware manufacturers to improve hardware and software. It is thanks to this cooperation that the new platform at CSCS now boasts such an extraordinarily efficient network, which enables the compute nodes to communicate with one another and plays such a vital role in the computing speed and efficiency of the computer.