Tachyum Targets Petaflop/s The startup is close to taping out its “universal” processor, branded Prodigy, which aims to deliver industry-leading performance on server, supercomputer, and AI workloads. Linley Gwennap European startup Tachyum has made considerable progress on its so-called universal processor, which is designed to offer industry-leading performance on server, supercomputer, and AI workloads. The […]
European startup Tachyum has made considerable progress on its so-called universal processor, which is designed to offer industry-leading performance on server, supercomputer, and AI workloads. The company plans to tape out the processor soon and sample late this year. It has simulated SPEC and other benchmarks to validate its performance claims, showing a 3x advantage over leading Intel and Nvidia data-center chips. Using 16-bit floating-point data, the design can deliver more than 1petaflop/s.
The processor, branded Prodigy, features 128 CPU cores that implement a custom VLIW-style instruction set. On the basis of simulations, Tachyum expects them to run at up to 5.7GHz, faster than any x86 or GPU core. The VLIW design enables high performance on general-purpose code without the overhead of complex instruction-reordering hardware. Each core also contains two 1,024-bit-wide vector units and a powerful matrix-multiply engine. In TSMC 5nm technology, the chip has an estimated 950W TDP, requiring liquid cooling. The company plans to sell lower-power versions as well.
The custom VLIW instruction set packs one or two RISC-like instructions per 32-bit word and up to four words per bundle. Unlike most VLIW architectures, Prodigy implements some dynamic reordering. Even with this simple reordering, the CPU requires only 10 pipeline stages. To accelerate AI workloads, each core has a matrix engine that handles a wide range of AI data types, including TF32, BF16, FP8, INT8, and INT4. The matrix engine is similar to Nvidia’s tensor core but much larger; when multiplying two FP8 matrices, it can produce 8,192 operations per cycle.
CEO Rado Danilak cofounded Tachyum to develop a chip that could outperform CPUs and GPUs across a variety of workloads, a concept he calls the universal processor. Based in Bratislava, Slovakia, and in Silicon Valley, Tachyum has raised $42 million from IPM Group and Slovakian investors.