Liquid cooling playing part in China's Top-500 supercomputer list assault
In 2017, TOP500, an international organization that assembles and maintains a list of the most powerful computer systems, released its latest ranking of the top 500 fastest supercomputers around the world. What is striking is that China overtook the United States in the total number of supercomputers on the list for the first time, ranking first with 202 systems. Also, China has led the world in supercomputing speed since 2013.
At international supercomputer shows in recent years, Chinese companies have gained industry attention and recognition at all levels based on their outstanding results. In addition to remarkable achievements in computing speed, Chinese companies have also excelled in the areas of environmental protection and energy conservation of commercial supercomputers.
One of these commercial supercomputers is the high-profile “Earth Data Simulation Device” from Sugon DataEnergy (Beijing) Co., Ltd., which fully simulates changes in the soil, oceans, and atmospheric movements around the globe. It is a deep learning machine facilitating the further development of artificial intelligence. The system uses the first successfully commercialized liquid-cooled blade server in China, the TC4600E-LP, and performs at significantly increased speed and reduced energy consumption, marking a significant step on the path of eco-friendly implementations.
Keeping cool with new technology
The rapid advances in computing capabilities provide nearly endless possibilities for a wide variety of applications, includingartificial intelligence, weather modeling and cryptocurrency mining to name just a few. Large numbers of computing chips require high-density deployment, elevating the heat emission of a single server cabinet to tens of thousands of watts. Further, the density of these electronics generates so much heat that traditional air-cooled technology can no longer keep pace with the cooling requirements.
Each central processing unit (CPU) in the server cabinet requires significant energy. Consequently, liquid cooling technology located at the CPU level has become the primary focus of research at Sugon DataEnergy. In addition, liquid cooling technology is a way to improve energy conservation in data centers. Compared with air cooling, liquid cooling achieves a lower power usage effectiveness (PUE) value, effectively reduces fan noise and vibration, and significantly reduces energy consumption.
Although liquid cooling technology has been around for some time, technical maturity, costs, and other factors have inhibited large-scale applications of liquid cooling in high-performance computing (HPC) systems. For users who are accustomed to running air-cooled server rooms, the transition from air cooling to liquid cooling is a process full of challenges. Besides differences in server architecture structures, corresponding changes must be made to the maintenance methods and procedures of server rooms.
Starting with HPC user needs
Beginning with the user challenges, Sugon DataEnergy worked with its server business units as well as upstream and downstream manufacturers. It dedicated itself to the research and development of highly cost-effective products with broad- user acceptance. The result of this research and subsequent development project is the TC4600E-LP, a liquid-cooled blade server. It cools major server components using a hybrid cooling model employing air cooling and liquid cooling, with the ability to increase the liquid cooling percentage to above 90%, for reduced overall energy consumption.
The TC4600E-LP adopts an integrated management module design with fixed water-cooled plates installed on the CPU and memory chips on each liquid-cooled blade server in the data center cabinet. The refrigerant circulates through inlet and outlet fluid lines with quick disconnect couplings at each blade.
Since the TC4600E-LP introduction two years ago, it has successfully provided over 2,000 nodes of computing power for the “Earth Data Simulation Device Prototype System” of the Institute of Atmospheric Physics of Chinese Academy of Sciences and the supercomputing system project of the simulation center of China Electric Power Research Institute of the State Grid. The density of each cabinet exceeds 25 kW, the energy consumption of the air-conditioning system is directly reduced by 70%, and the PUE value is maintained at approximately 1.2.
Liquid cooling technology is the most suitable choice for blade servers because there are fewer modifications or changes required to accommodate the technology. In a blade server configuration, liquid cooling replaces air-cooled heat sinks, and the refrigerant lines come out of the rack. Safe and convenient quick-disconnect couplers keep operation and maintenance similar to current user practices.
Collaboration is key to success
In cooperation with upstream and downstream research, development, and manufacturing partners, Sugon DataEnergy developed and optimized various types of products for specific applications. Since the HPC market in China is not yet mature, Sugon DataEnergy values the capabilities of upstream manufacturers and engages in in-depth technical exchanges with suppliers of various applications. They believe that more manufacturers investing additional efforts in the research and development of raw materials and components increases the development and deployment of liquid cooling technology, not only in China, but worldwide.
Cui Xintao, director of research and development of liquid cooling systems at Sugon DataEnergy emphasizes, “All components of the liquid cooling system are very crucial. A problem with any component has a devastating impact on a server. The advent of a liquid cooling system is precisely due to support by these key suppliers.”
Small components with critical roles
Liquid cooling quick disconnect couplers are an essential element in the design of a liquid cooling system. The spatial structures of server rooms, along with fluid line safety and ease of maintenance, are critical considerations in selecting and designing quick disconnects. Xintao explains: “Fluid handling quick disconnects are core components in the liquid cooling module. They ensure that users can quickly connect and disconnect couplers in the course of use and maintenance, without leakage of the refrigerant.”
CPC (Colder Products Company), the upstream manufacturer cooperating with Sugon DataEnergy, is engaged in the research, development, and manufacture of quick-disconnect couplers. Because of the vital role of non-spill couplers in liquid cooling system solutions, CPC established an official partnership with Sugon DataEnergy in 2013. CPC’s fluid handling products in use at internationally recognized companies was an important factor in Sugon DataEnergy’s selection of a refrigerant fluid handling partner. Continued collaboration and ongoing communications between Sugon DataEnergy and CPC provides a deep understanding of each partner’s level of specialization, thus ensuring the development of leading technologies.
Ongoing product evolution
In the early stages of joint research and development, liquid cooling technology was still at an exploratory stage dominated by prototype testing. Sugon DataEnergy tested CPC’s PMC12 polypropylene connector in a full immersion, front-loaded validation prototype, and conducted long-term observations and tests on its pressure maintenance and safety.
Owing to the PMC12 connector’s excellent performance in various applications, after one year of successful test results, Sugon DataEnergy needed metal couplers to meet new user demands for quick disconnect materials. Along with advancements in technology and continued collaboration between both organizations, Sugon DataEnergy began testing CPC’s LQ6 quick disconnects on blade servers.
Specifically designed for liquid cooling applications, the LQ6 quick-disconnect coupler is manufactured from chrome- plated brass, polysulfone, and stainless steel to ensure reliable strength and chemical compatibility for large-scale applications. The CPC LQ6 quick disconnect coupler uses unique cross- section and valve technologies to ensure no spill occurs upon disconnection, even after being connected for extended periods.
The design of the LQ6 requires no changes in how the system is used and no modifications to existing maintenance processes. The LQ6 quick disconnect coupler also optimizes flow efficiency, is equipped with an ergonomic thumb latch, and is coded in two colors—blue and red—to provide operating and maintenance staff with visual cues to prevent incorrect connection. The pre-tightened thread design also effectively avoids the risk of loosened connections due to vibrations during the operation of the devices.
Based on customer feedback and blade server designs, Sugon DataEnergy proposed the idea of a blind mate design to ensure successful connection of a connector even when there is a deviation in the alignment at the back of the server blades. CPC custom-designed the RP-LQ2 blind mate product specifically for the TC4600E-LP system. Its unique self-centering design allows for a slight offset at the docking location between blade and rack, automatically correcting to the center while ensuring electronic connections are accurate, and achieving non- destructive docking. With the blind plug design, methods of maintaining a liquid-cooled server are the same as those of an air-cooled server. The addition of multi-layer leak detection and overflow technologies improves data center safety and stability.
Future path for innovation
Xintao assesses the collaboration with CPC as follows, “CPC has robust technical strengths and plenty of experience in the research, development, and manufacture of quick-disconnect couplers. Sugon DataEnergy is eager to cooperate with a technically mature manufacturer like CPC and has selected CPC as the primary supplier of liquid cooling module couplings to Sugon DataEnergy.”
For CPC, the technical exchanges with Sugon DataEnergy are an essential function in enhancing its research, development and custom production capabilities. Pytheas Zhang, CPC engineering manager, says: “I believe that the cooperation between both organizations further improves liquid cooling technology and drives growth in the industry.”
To promote ongoing development and innovation, Sugon DataEnergy continues to maintain contact with partners in the ecosystem proactively. As one of the major component suppliers to Sugon DataEnergy, CPC continues to improve fluid handling technology during this process. To best respond to the liquid cooling development trend, engineers from both organizations have taken high-performance computing out of the laboratory and into production environments while working collaboratively to develop innovative products of the future.