National Supercomputing Centre and Sunway TaihuLight

Wuxi is home to China's National Supercomputing Centre and the Sunway TaihuLight supercomputer. This system came to global attention in June 2016, when it topped the global Top500 list of the world's most powerful supercomputers, far surpassing the second system (also Chinese, the Tianhe-2A) let alone the third (Titan, from the United States). Not only that, Sunway TaihuLight held the top position for an unprecedented and never-repeated two years in succession. until the US system, Summit, hosted at Oak Ridge National Laboratory, took the top position in the June 2018 metrics. Nevertheless, almost ten years later, Sunway TaihuLight has remained not only a Top500 supercomputer, but remains at in the top 25 systems (placed #24 in November 2025) without any change to the original configuration in all that time, a blunt indication of how advanced it was at the time.

The Sunway TaihuLight consists of 10,649,600 cores and a Rmax of 93.01 petaflops and an Rpeak of 125.44 petaflops. With various import restrictions in place (those who preach free trade don't like practising it with a competitor), the processors are a home-grown variety, Sunway SW26010 260C, a 64-bit RISC running at 1.45GHz. The clock rate might seem low, but it matches the requirements of a manycore processor and, as the manufacturer code suggests, the Sunway SW26010 consists of a truly impressive 260 cores per processor, arranged as four clusters of 64 Compute-Processing Elements (CPEs) in an eight-by-eight array. These CPEs support SIMD instructions, making the chip (at a very high level) seem like something between a traditional CPU and a GPU architecture. The CPE clusters also have a more conventional general-purpose core, the Management Processing Element (MPE), that provides supervisory functions. As each node has 260 cores, there are "supernodes" of 256 nodes, and each cabinet holds 4 supernodes. There are 40 cabinets in total, providing over 10 million cores. Sunway has its own interconnect, with a five-level integrated hierarchy: (i) computing node, (ii) computing board, (iii) super-nodes, (iv) cabinet, and (v) complete system with a network link bandwidth of 16GB/s and total I/O bandwidth of 288 GB/s.

The operating system is also custom-built, Sunway Raise OS, based on Linux. Common compilers (C, C++, Fortran), an automatic vectorisation tool, basic math libraries, and a customised version of OpenACC are available. The software build system is also specialised, targetting the Sunway processor. Whilst minimal modifications have been sought, those applications designed for GPUs have been "significantly more challenging". Nevertheless, dozens of applications have been written which can, in theory, scale to use the entire system, with early tests of atmospheric simulations scaling effectively to eight million cores. Other early simulations of note include atomistic simulations of silicon nanowires and ultra-high-resolution global ocean surface wave numerical simulations. Parallel software compilation at the node level generally uses MPI. For the four CGs within the same processor, software can use either MPI or OpenMP, but within each CG, Sunway OpenACC is used. Sunway OpenACC uses the OpenACC 2.0 syntax but targets the CPE clusters and includes parallel task management, heterogeneous code extraction, and data transfer descriptions. Syntax extensions from the original OpenACC 2.0 standard include finer control over multi-dimensional array buffering, and packing distributed variables for data transfer.

The Sunway TaihuLight will be remembered alongside other systems that are "giants" in history for their performance, architecture, and lasting impact, and contributions to science, such as ENIAC, UNIVAC, CDC-6600, Cray-1, Beowulf, and RoadRunner. A system as powerful, innovative, and novel as this comes along perhaps once a decade, and after 10 years of operation, Sunway TaihuLight has earned its place in computing history. All the engineers and administrators who have built, operated, and maintained this system deserve respect for their contributions. What is especially remarkable is that, due to the political climate, this system had an additional requirement for novel design. However, I have been informed by a trustworthy source to "watch this space"; there are plans for a system ten times as powerful in the very near future.