Rethinking Computing in the GPU Era

Article By : Aki Fujimura

We are in a new era of GPU-based computing that accepts "wasteful" computing.

The term “useful waste” may seem like an oxymoron.  In computing, though, it embodies an important, emerging class of computing approaches that take advantage of an abundance of computing power to enable programs we practically could not write before. And it may well be the future of computing in this new century, where we have been scaling computing power by bitwidth instead of by clock speed.

When Moore’s Law still ruled the land, computer scientists said, “Computing will soon be free.” It felt like this might even be true by the time we reached the new millennium. Even ten years ago, some said, “Computers are already fast enough and cheap enough to do everything we need.” For the consumer segment, most typified by the internet of things (IoT), that has turned out to be true to some extent.

Over the last 20 years, the semiconductor market has bifurcated, with a large number of fabs continuing to focus on optimizing the performance of the older nodes like 28nm and 22nm nodes, and a much smaller number of fabs investing in the extremely expensive leading-edge technologies to 5nm now and pursuing 3nm and 2nm and beyond.

Aki Fujimura_D2S
Aki Fujimura

So, while much of the consumer segment seems to have found a happy home in the established and predictable older process nodes, other segments, such as high-performance computing and computer gaming, still crave more computational power and speed. Although much research and development continues to focus on how to compute things most efficiently by avoiding wasteful computing, some efforts began to focus on how computing would be different with limitless computing power.

One example of this is the idea of pre-computing, which emerged around the end of the last century. Some early machine-learning techniques, for example, embraced a philosophy of “who cares how long it takes to pre-compute tables, parameters, or neural-network weights so long as the resulting computation at run-time is fast.”

In the new century came the advent of general-purpose graphical processing units (GPUs), which were an outgrowth of the eternally resource-hungry and extremely lucrative computer gaming and image processing market. This vast market was a source of R&D funding that enabled Jensen Huang to lead Nvidia to build a new “derivative” market for general-purpose GPU-accelerated computing in server-class machines for high-performance computing that shared the majority of the core investments from graphics processing. General-purpose GPUs targeted scientific and other professional computing that used their single-instruction multiple-data (SIMD) processing architectures to simulate natural phenomena, from weather to the behavior of self-driving cars to semiconductor manufacturing processes.

The last ten years have seen the rapid rise of artificial intelligence (AI) solutions in general, and deep learning (DL) in particular. Engineers have figured out how to get better — and previously impractical — results from computing in the last decade with DL, often using GPU-based platforms. Unlike conventional programming, which seeks to teach a computer to “think” (e.g., if/then/else), DL trains the computer to “recognize,” so it is very well-suited to SIMD, GPU-based computing. DL training cycles run hundreds of thousands or millions or hundreds of millions of data points to train a neural network to distinguish a dog from a cat (or a good photomask from one with a critical error) but once trained, the network can distinguish it in seconds.

The idea of useful waste is central to DL. Traditional approaches to programming and computation spend a lot of effort on conserving computational resources by pre-processing data to determine what is “important” and only sending “worthwhile computing” to any processor. When your main goal is to try to avoid wasting computational resources, you spend a lot of time determining what will be wasteful and working around that, using approximations (that can impact the quality of results) to keep runtime in check.

Useful waste, as embodied in DL, is a sort of brute-force computing that skips the pre-processing and just sends all the data to a GPU processor. The GPUs are so fast that even if some of the data isn’t that “important,” it’s faster to just process everything rather than to spend time pre-processing.

This gain in computational speed alone is worth adopting this approach. However, the high-level benefit of useful waste is that brute-force computing is much faster to program. Engineers are freed from spending the majority of their time pre-thinking for the computer; they can just let the computer compute. Instead of working mostly on logic whose intent is to avoid computing, programmers can spend their time and energy exploring higher-level architectural issues.

If you have the ability to “waste” computing power to “just do it,” new possibilities emerge. When programs are designed specifically for GPU-based computing, the programming freedom that comes from the ability to use brute-force computing can lead to solutions to very complex problems that previously seemed out of reach. Using useful waste, DL can regularly beat the chess masters when the best computer scientists in the world couldn’t write a program to accomplish that feat before.

We have entered a new era of GPU-based computing that accepts that doing what used to be seen as “wasteful” computing can yield astonishingly useful results. DL is an existence-proof of the potential benefit of this useful waste approach. This new GPU era challenges us to rethink computing, including what is “free” and what is “wasteful.”

This article was originally published on EE Times.

Aki Fujimura is chairman and CEO of D2S, a supplier of GPU-acceleration solutions for semiconductor manufacturing. He was a founding member of Tangent Systems in 1984, which was subsequently acquired by Cadence Design Systems in 1989. He had two tenures at Cadence Design Systems, including as CTO, having returned to Cadence for the second time through the acquisition of Simplex Solutions where he was president/COO. He was also a board member and VP at Pure Software, as well as on the boards of HLDS, RTime, Bristol, S7, and Coverity, Inc., all of which were successfully acquired. Fujimura received his BS and MS in electrical engineering from MIT.

Subscribe to Newsletter

Leave a comment