Passing three milestones is a great indicator that a technology is poised to spread like wildfire. We're seeing clear evidence that embedded vision has reached this point.
A technology reaches a tipping point when it hits three milestones: First, it becomes technically feasible to accomplish important tasks with it. Second, it becomes cheap enough to use for those tasks. And third, critically, it becomes sufficiently easy for non-experts to build products with it. Passing those milestones is a great indicator that a technology is poised to spread like wildfire. At this year’s Embedded Vision Summit (coming up online May 25-28), we’re seeing clear evidence that embedded vision has reached this point.
Embedded vision passed the first two milestones a while back. A huge part of putting the technical feasibility milestone in the rear-view mirror was the advent of deep neural networks, which revolutionized the tasks that vision could do. Suddenly, classifying images or detecting objects in messy real-world scenes was possible, in some cases with accuracy surpassing that of humans. To be sure, it wasn’t easy, but it was doable.
Moore’s Law, market economics, and domain-specific architectural innovation took care of the second milestone. Today, for $4.99, you can buy a tiny ESP32-CAM board that has a dual-core, 240-MHz processor and a 2-MP camera module with an on-board image signal processor and JPEG encoder; it’s a squeeze do to computer vision on it, but it’s certainly possible, and it’s tough to beat the price. If you have more money to spend, your options widen significantly. For example, $99 will get you an Nvidia Jetson Nano Developer Kit with a quad-core 1.4-GHz CPU, a 128-core Maxwell GPU, and 4 Gbytes of memory — more than enough to do some serious embedded vision processing.
Best of all, new processors show up monthly, at all sorts of price, power, and performance points, often with specialized architectures that boost performance on computer vision and neural network inference tasks. Examples include new offerings from Xilinx, Cadence, and Synaptics.
It’s that pesky third milestone, ease of use, that’s been the rub. Sure, deep learning radically changed what vision systems were capable of, but you needed to be a ninja to be able to design neural networks, gather the data needed, and train them, to say nothing of then having to implement them on a resource-constrained embedded system. But that’s really changed in the last few years. Two big shifts have driven that change.
First is that you don’t have to build embedded vision systems from scratch anymore, thanks to the widespread availability of high-quality, well-supported tools and libraries for vision. The most obvious of these are frameworks like TensorFlow or PyTorch and libraries like OpenCV. But widely used task-specific neural networks, such as Yolov4 or Google Inception, have changed the game. No longer do most developers design a neural network; rather, they pick a free off-the-shelf neural network and train it for their task. (Of course, to train a neural network, you need data. Depending on your application, this may represent a challenging data collection project, although there are an increasing number of open-source data sets available, as well as techniques to augment your data or reduce the amount of data you need.)
These building-block libraries and tools may be chip vendor-specific. An example is Nvidia’s DeepStream SDK, which simplifies the creation of video analytics pipelines. Although DeepStream is tied to Nvidia’s Jetson processors, it’s a great example of a vendor’s providing something closer to a complete solution (as opposed to “just silicon”). BDTI and Tryolabs recently built a facemask-detection smart camera product using DeepStream and YoloV4
Second is the availability of tools specifically designed to simplify the process of creating embedded vision and edge AI systems. A great example is Edge Impulse, whose tools ease development of embedded machine learning and vision systems. For instance, the Edge Impulse platform can be used to train and program an image recognition neural network for that $4.99 ESP32-CAM mentioned above. Similarly, for beefier processors, Intel’s DevCloud for the Edge and OpenVINO tools aim to make vision far easier to implement at the edge.
Think back to the 1990s, when wireless communications was the “new new thing.” To start with, it was expensive magic that required a team of RF wizards to make happen. But it reached the tipping point, and today, anyone can buy RF modules for a few dollars to enable wireless communications in an embedded product. In the process, literally billions of wireless units have been shipped with correspondingly huge economic impact.
Embedded vision is at a similar tipping point, and the Embedded Vision Summit is a great place to watch it happen in real time.
This article was originally published on EE Times.
Phil Lapsley is a co-founder of consulting firm BDTI and one of the organizers of the Embedded Vision Summit which will be held online May 25-28.