The ability to do AI inferencing closer to the end user is opening up a whole new world of markets and applications.
While AI originally was targeted for data centers and in the cloud, it has been moving rapidly towards the edge of the network where it is needed to make fast and critical decisions locally and closer to the end user. Sure, training can be still done in the cloud, but in applications such as autonomous driving, it is important that the time-sensitive decision making (spotting a car or pedestrian) is done closer to the end user (the driver). After all, edge systems can make decisions on images coming in at up to 60 frames per second, enabling quick actions.
These systems are made possible through edge inference accelerators that have emerged to replace CPUs, GPUs and FPGAs at much higher throughput/$ and throughput/Watt.
The ability to do AI inferencing closer to the end user is opening up a whole new world of markets and applications. In fact, IDC just reported that the market for AI software, hardware, and services is expected to break the $500 billion mark by 2024, with a five-year compound annual growth rate (CAGR) of 17.5% and total revenues reaching an impressive $554.3 billion.
This rapid growth is likely due to the fact that AI is expanding from “just a high-end functionality” into products closer to consumers, essentially bringing AI capabilities to the masses. In addition, recent products announced have started breaking the cost barriers typically associated with AI inference, enabling designers to incorporate AI into a wider range of affordable products.
While the example of autonomous driving above is the most common one people usually think of when they think of edge AI inference, there are actually many other markets closer to becoming reality or that exist today. Below I will highlight just a few, but the possibilities are endless as the technology evolves and the market starts benefiting from volume shipments and manufacturing of AI accelerators.
When it comes to edge AI inference, there are four key requirements for customers not only in the markets mentioned above, but also in the many markets that will emerge to take advantage of these accelerators.
The first is low latency. In all edge applications, latency is #1 which means batch size is almost always 1.
Support for BF16 and INT8 is essential. An inference accelerator that can do both floating point and INT8 gives customers the ability to start quickly with BF16 and shift seamlessly to INT8 when they are ready to make the investment in quantization.
High throughput is also critical. Almost every application wants to process megapixel images (1, 2 or 4) at frame rates of 30 or even 60 frames/second. Customers today all have models that work for their applications and how a solution runs their model is all that matters to them. All they care about is throughput and not meaningless benchmarks such as TOPS.
Customers want more throughput/image size per dollar and per watt so that new applications will become possible at the low end of the market where the volumes are exponentially larger. Solutions have emerged that deliver on this efficiency requirement.
There is much innovation happening around inference at the edge due to the vast amounts of new markets and applications that can benefit from its throughout efficiency and accuracy. Prices are coming down while still maintaining or beating the performance of the higher priced systems. This will drive AI inference capabilities into applications that we have not even thought about yet. I believe this is going to be one of the most exciting application areas of our time.
This article was originally published on EE Times.
Geoff Tate is CEO of Flex Logix.