The European ExaNoDe project has built what it claims is a groundbreaking compute node prototype paving the way to exascale. It combines a 3DIC with multi-chip-module integration technologies, heterogeneous compute elements with Arm cores and FPGA acceleration, and the UNIMEM memory system, powered by a high-performance, high-productivity software stack.

The node uses state-of-the-art 7nm Arm-core based chiplets with silicon interposer and HBM2, enabling a modular, cost-effective and energy efficient avenue to achieve multi-teraflops heterogeneous compute nodes. The project coordinator, Denis Dutoit, a research engineer at CEA-Leti, said, “Affordability and power consumption are the main hurdles for an exascale-class compute node. In the ExaNoDe project, we have built a complete prototype that integrates multiple core technologies: a 3D active interposer with chiplets, Arm cores with FPGA acceleration, a global address space, high-performance and productive programming environment, which will enable European technology to satisfy the requirements of exascale HPC.”

ExaNoDe concept

The ExaNode high performance compute node concept (Source: ExaNoDe / Euromicro DSD conference)

Achieving the new levels of compute density and power efficiency necesary to build an operational exascale machine will require a disruptive change in technology. The ExaNoDe protoype is one way to do it.

It uses an innovative interposer, developed by CEA, enabling multiple system-on-chip (SoC) chiplets to be combined, forming a three-dimensional integrated circuit (3DIC). This delivers multiple advantages, such as higher chip fabrication yields thanks to the smaller chip size and reduced inter-chip communication distances, resulting in improved energy efficiency. In addition, the technique reduces the costs of customization, since the modular design allows combination of cutting-edge technology with lower-cost, more-established technology as required; and the flexibility to slot in compute elements — such as CPUs and accelerators — in a single chip for different applications, resulting in greater performance at lower design costs.

The project takes in the results of other European projects, with the UNIMEM memory system, created in the EUROSERVER project, which is being brought to scale in the EuroEXA project; the result is the ability to create shared memory among multiple compute nodes. The UNIMEM shared memory is accessible through a non-coherent global address space and is made visible to the programmer via a native UNIMEM API, standard MPI-3.0 and GPI-2. In order to increase the resilience and improve the manageability of the compute node, the software stack also includes virtualization, with checkpointing and virtualization of the UNIMEM capabilities.

The ExaNoDe project’s research activities also extend to applications, with several application areas selected to ensure broad coverage, including materials science and engineering. Some ‘mini applications’ — self-contained and based on real-life applications — have been developed and ported to the architecture. Initial work has been performed to accelerate the key kernels on the compute node’s FPGA logic, and this expertise will be brought to future and ongoing projects such as EuroEXA. ETHZ developed the open source ExaConv convolutional neural network accelerator to accelerate neural network training as a demonstration of heterogeneous integration.

A pan-European collaboration
Thirteen partners from six European countries were involved in the ExaNoDe project: CEA (France), Arm (UK), the University of Manchester (UK), ETH Zürich (Switzerland), CNRS (France), Kalray (France), FORTH (Greece), Virtual Open Systems (France), Fraunhofer ITWM (Germany), Barcelona Supercomputing Center (Spain), Forschungszentrum Jülich (Germany), Atos (France) and scapos (Germany).

Silicon-level power-management techniques, chiplet and nanotechnology design knowledge was provided by CEA, Arm, ETH Zürich and Kalray; CNRS supported CEA in the assembly and packaging of devices. FORTH contributed its expertise in device-to-PCB integration and in the implementation of the UNIMEM memory scheme, firmware and operating systems. Virtual Open Systems provided the virtualized checkpointing and UNIMEM virtualization. Fraunhofer, BSC and University of Manchester enabled the programming environment. Forschungszentrum Jülich and CEA brought their expertise in mini-applications. ATOS provided HPC end-user requirements ensuring that all the technology is appropriate for deployment and integration. scapos ensured that the consortium worked together and successfully delivered the vision of the ExaNoDe project.

This research project has been supported by the European Commission under the Horizon 2020 Framework Program.