LINUX FOUNDATION REPORT 20 Years of Top500 Supercomputer Data links Linux with Advances in Computing Performance. 4. Lin
20 years of Top500.org Supercomputer Data Links Linux With Advances in Computing Performance Libby Clark & Brian Warner The Linux Foundation July 2013
www.linuxfoundation.org
Introduction This year marks the 20th anniversary of the Top500 list of the world’s fastest supercomputers. The Top500 (www.top500.org), compiled and released each June and November, has become the common yardstick by which researchers in the field of high-performance computing measure the best of the best. And so, twice a year, the Linux community takes great pride in tallying the number of Linux machines on the Top500 list. At last count, all of the top 10 computers and 476 of the total list ran the Linux operating system. After first appearing on the list in 1998, Linux has consistently dominated the top 10 over the past decade and has comprised more than 90 percent of the list since June 2010. This alone is a great accomplishment, but it’s not the whole story. A review of the past 20 years of Top500 data also reveals that Linux is the driving force behind the breakthroughs in computing power that have fueled research and technological innovation. In other words, Linux is dominant in supercomputing, at least in part, because it is what is helping researchers push the limits on computing power. A few notable observations from the data: • Advances in supercomputing over the past decade have largely taken place on Linux machines built using exotic architectures and novel techniques that required the immense flexibility of open source. • Total RMax, the measure by which computing power is ranked on the Top500, has grown steadily since the list debuted in 1993. But the proportion of machines running Linux took off in the early 2000’s. By graphing total RMax over time Graph 2), it’s easy to see that Linux machines have been the sole drivers of total computing capacity on the list. • In 2004, six years after its debut on the Top500, Linux machines already accounted for half of total RMax (Graph 3). • In just 10 years the dominant operating system by performance share on the Top500 List underwent a complete inversion from 96 percent Unix to 96 percent Linux.
RMax Grows with Linux and Hardware Advances RMax, a measure of how quickly a machine can complete the Linpack benchmark calculations, has outpaced Moore’s law, doubling roughly every 14 months as breakthroughs in computing have occurred. The RMax of the fastest supercomputer on the Top500 list has increased by a factor of three to reach the Tianhe-2’s 33.86 petaflop/second in 2013 from the CM-5’s 59.7 gigaflop/s in 1993. Unix was the dominant operating system in the early days of supercomputing when advances were made by steadily piling up the number of processors per system to produce gigaflop speeds. It maintained its hold as parallel computing became the norm in the 1990s when the Top500 list was founded, and machines with thousands of processors consistently reached the teraflop level. But system architecture became a whole lot more complex after 1996, when Intel’s ASCI Red machine first broke the teraflop barrier. Entries on the top500.org list by operating system 500 450
Number of entries
400 Windows UNIX Other Mixed Mainframe MAC OS BSD-Based Linux
350 300 250 200 150 100 50 0
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Graph 1: Entries on the top500.org list by operating system.
LINUX FOUNDATION REPORT 20 Years of Top500 Supercomputer Data links Linux with Advances in Computing Performance
2
As TFlop/s performance became more commonplace in the 2000’s, systems evolved from homogenous nodes containing piles of the same processors to heterogeneous nodes built for massive parallel processing in custom-designed machines. It was during this same post-teraflop period that Linux began its rise from representing 50 percent of the top performing machines to now more than 97 percent. This happened for two reasons: 1. Most of the world’s top supercomputers are superscalar research machines built for specialized tasks. In essence, each is a standalone research project with unique characteristics and optimization requirements. It wasn’t economical for any single commercial vendor to develop a custom operating system that would only be used once. The research teams building each system could modify and optimize Linux to the exotic, one-off, groundbreaking designs that characterize the modern generation of supercomputers. 2. The licensing cost of a custom, self-supported Linux distribution is the same, whether you’re using 20 nodes or 20,000,000 nodes. And by tapping into the vast open source Linux community, projects had access to free support and developer resources to help keep developer costs on par with, or below other operating systems. With a vendor supported OS, institutions would go bankrupt fast at these scales if they had to pay for it. Sum of Rmax on the top500.org list by operating system 250
Sum of Rmax (M)
200 Windows UNIX Other Mixed Mainframe MAC OS BSD-Based Linux
150
100
50
0 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Graph 2: Sum of Rmax on the top500.org list by operating system.
Share of Rmax on the top500.org list by operating system 100% 90%
Percent of Rmax
80% 70% 60% 50% 40% 30%
Windows UNIX Other Mixed Mainframe MAC OS BSD-Based Linux
20% 10% 0%
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Graph 3: Share of Rmax on the top500.org list by operating system.
LINUX FOUNDATION REPORT 20 Years of Top500 Supercomputer Data links Linux with Advances in Computing Performance
3
Linux Drives Supercomputing, Powers Performance By offering a free, flexible and open source operating system, Linux made it cost effective to design and deliver custom hardware and system architecture designs for the world’s top-performing supercomputers. As a result, the proportion of computers running Linux on the Top500 list saw a meteoric rise starting in the early 2000s to reach more than 95 percent of the machines on the list today. And RMax continued its steady climb upward, contributing to scientific breakthroughs that were unimaginable even a few years earlier. It’s worth reviewing a few examples to understand the real-world impact this is having on research and development.
IBM’s Linux-based Roadrunner Machine: Modeling the Human Brain, Simulating the Big Bang Less than one week after IBM’s Linux-based Roadrunner machine broke the Petaflop/s barrier in 2008, for example, scientists at Los Alamos National Laboratory were using it to model the more than 1 billion neurons dedicated to vision in the human brain. Before being retired in 2013, Roadrunner also helped map HIV’s genetic evolution and simulated the Big Bang. And as the first system to combine different processors -- IBM PowerXCell 8i coprocessors with standard dual-core x86 CPUs -- it not only ushered in a new era of scientific discovery, it inspired a new generation of hybrid supercomputers.
Cray’s Titan Supercomputer: Averting Ecological Disaster Now the next generation of hybrid supercomputers, combining CPUs and GPUs, promises even more innovation and scientific discovery. Cray’s Titan supercomputer took the No. 1 spot in the Top500 in 2012, as a hybrid of CPUs and GPUs with a 17 Petaflop/s performance running Cray Linux. Funded by the U.S. Department of Energy and the National Oceanic and Atmospheric Administration, Titan is running climate change scenarios, among many other projects, in an effort to help avert what could become the biggest ecological disaster in the history of humankind. Other Linux machines of the Top500, such as IBM’s Sequoia, continue to push the boundaries of supercomputing, recently going beyond 1 million cores to compute the complex physics of noise. While many more contribute to groundbreaking research in cancer, designer drugs, natural disaster, materials science, oil and gas exploration, industrial modeling, and much more.
The Bottom Line By isolating RMax by operating system using the past 20 years of Top500 data, it’s clear that Linux is not only responsible for supporting the majority of supercomputers today, but it a driving force behind the disproportionate growth in supercomputing capacity over the past decade. In continuing to drive progress and innovation in computing, Linux is also helping to explore the mysteries of the universe and solve our toughest problems.
About The Authors Brian Warner is a Senior Client Services Manager at The Linux Foundation. Libby Clark is a Digital Content Editor at The Linux Foundation.
LINUX FOUNDATION REPORT 20 Years of Top500 Supercomputer Data links Linux with Advances in Computing Performance
4
The Linux Foundation promotes, protects and standardizes Linux by providing unified resources and services needed for open source to successfully compete with closed platforms. To learn more about The Linux Foundation or our other initiatives please visit us at www.linuxfoundation.org