Enhancements in how machines can utilize voice, video and visual data promise to revolutionize how humans interact with machines...
Consumers have an insatiable appetite for advancements that improve their convenience, safety, and user experience. We see that in an obvious way with the human interface, which has evolved over the years from being purely tactile, to include a wider range of input methods from voice to gesture to video and various computer vision capabilities, everywhere from sales terminals to smart homes. The next step in this will be devices that not just understand direct commands but can infer intent.
In parallel, the gnawing concerns over security and latency of traditional cloud-based connected devices have paved the way for more edge-based processing. This is especially true in human-machine interface (HMI). But local processing adds another wrinkle for technology developers who must consider the specific use case requirements, development options, and cost of smart (machine learning trained) devices that introduce new levels of automation to power perceptive intelligence and ambient computing.
Edge AI is the foundation
The foundation enabler for a more sophisticated, user friendly and safer IoT experience is what has commonly been called edge AI. By definition, edge AI implies that the AI processing is running within the end product itself (a set-top-box or smart display, for example) and not in the cloud. The rationale for this is well understood — better privacy, less bandwidth, faster response times, even eco-friendliness as edge processing reduces the energy, water, and other resources to run massive data centers.
Edge AI has been adopted in many applications that touch our lives every day, but initial uses have largely been limited to expensive products, such as smart phones and cars. As a result, the edge AI implementations targeting these products are also expensive and have been out of reach for consumer retail devices for the smart home. And for the most part, the existing Edge AI applications are one dimensional in terms of the user experience they offer — AI-enabled vision in an ADAS application, or picture quality enhancement in a mobile phone, for example.
What would be the compelling reasons for creating and adopting Edge AI solutions for the smart home?
HMI driving edge AI in the home
We’re seeing a particularly strong interest and growing array of use case opportunities in the ubiquitous consumer IoT segment — a catch-all term for various entertainment, communication, home automation, security, and sundry other devices, appliances, and gadgets that we increasingly rely on. Especially in current times, consumers want a connected experience but without the cost, privacy, and performance issues of being traditionally connected. The desire for a more immersive and perceptive human-machine interaction is a key factor driving the need for edge AI in the smart home.
With a smart home-focused AI-based edge computing solution in the market, the performance needed to create a more human-like experience will be available for a wider range of products.
Real-world examples that benefit from Edge AI in the Smart Home are plentiful. Some have an obvious practical benefit. A home doorbell camera that can tell the difference between a package drop and a package theft. Entertainment devices that can automatically detect and upscale low-resolution video streams to a higher resolution with excellent perceptual quality, making better use of high-resolution TV displays. Even familiar and now nearly ubiquitous video conferencing applications can be enhanced with higher quality video and audio and made available on cost-effective devices.
Other examples may seem more futuristic. A refrigerator that can provide suggestions of what to make for dinner based on contents within the fridge. An oven that can tell you when your meal is cooked to perfection. A virtual personal home yoga trainer that can remind you to straighten your arms during a pose. Home automation devices that work together to anticipate the homeowner’s needs, from heating the house, to preparing food, to choosing what to watch on TV.
Such solutions can combine video, vision, and voice sensors with AI processing capabilities to bring enhanced functionality to a new generation of familiar devices such as smart displays and soundbars, set-top-boxes, appliances, and security cameras.
What each of these applications has in common is the need for an edge-based AI-based solution that is specifically tailored for the smart home, and not smartphone or automotive applications. To further democratize Edge AI, a solution needs to be:
Smart home HMI requires a multi-modal approach
As we discussed earlier, Edge AI-based solutions for smartphones and automotive applications primarily focused on camera vision use cases. However, in the smart home, a multi-modal HMI is a critical element in enhancing the user experience in this new era of connected devices. Take the example of a set-top box. This application would require video AI, perhaps in the form of video enhancements as discussed earlier. It would also require voice AI to be able to identify through their voice commands who is watching the TV and configure the experience accordingly, for example making it easier to select your favorite shows. It may even require vision AI, with a built-in camera that enables an enhanced and intuitive video conferencing experience while chatting with distant family members.
The ideal solution would be a smart home focused SoC that can support high performance video, voice and vision processing together with an integrated AI accelerator. The Synaptics VS600 SoC family is an example of such a solution. Such an approach is not only optimized to meet the multi-modal AI performance requirements for smart home applications, but also having this all integrated into a single chip makes it accessible to common household products sold at consumer market price points.
This needed solution begins with an SOC platform that integrates multiple types of processor engines: CPU, NPU, GPU, and ISP as well as hooks to high performance cameras and displays. Such an architecture enables the desired combination of highly secure, low-cost inferencing and real-time, multi-mode performance. The Synaptics Edge AI family is a series of SoCs that are each highly targeted for their given consumer application. Each SoC in the family integrates the required processing cores together with the appropriate level of integrated AI performance for that application.
A full stack tools approach for ease of AI development
As we have seen, the cost/performance tradeoff is critical to success in expanding Edge AI to more applications. In the competitive consumer electronics sector, time to market and differentiation are also essential. In order to address the challenges of broader proliferation of Edge AI, a full stack approach, which includes necessary development tools to bring AI innovations to an Edge AI SoC, is required.
Most importantly, the desired toolset should be compatible with the large and growing user community of AI developers. For example, the toolkit would enable developers to import models created with industry-standard frameworks such as TensorFlow, TensorFlow Lite, Caffe, and ONNX. This allows developers the ability to leverage existing AI innovations and get them working on the targeted SoC quickly and painlessly.
Let’s use the personal home yoga trainer application that we discussed earlier. The AI model underlying that application would be a body pose estimation model, an industry-standard concept that is used to detect the relative skeletal position of the user in the camera’s sight. If an AI developer had their own implementation of a body pose estimation model created in an industry-standard tool, such as TensorFlow lite, they would use the toolkit to import it for use on a the desired SoC.
When developers are ready, the toolkit should enable them to optimize the performance of their AI model for the chosen processor on which it will be running. The developer can choose to use open frameworks, such as TensorFlow or TensorFlow Lite, but utilize them in a way that the capabilities of the target processor is kept in mind. Or they can again use an SoC-specific tool, such as the Synaptics’ SyNAP tool, which supports optimization specifically targeting the processors within the VS600 SoCs. In our example, the developer can use the SyNAP optimization capabilities to configure their body pose estimation model to, say, be able to run in real-time, at 30 frames per second, on a VS600 SoC.
But, Security and Privacy Need to Meet Consumers’ Expectations
The future of HMI sounds bright, but perhaps the biggest barrier to adoption will be users’ perception that their privacy and security will be compromised. There are plenty of recent stories in the news that would validate that concern. Any meaningful HMI solution would have to take this into consideration.
Fortunately, the very fact that this video, voice and vision data would be processed in the device and not in the cloud is a huge improvement in terms of privacy. In the video doorbell example, by adding AI intelligence into the doorbell itself, the video from your front door does not need to be streamed 24/7 to the cloud, but rather only when there are specific events of interest. For example, the video would only be streamed when the AI engine detects a nefarious person is approaching the door. Or, in our home yoga trainer example, the application can run completely in the device as we showed earlier, with no need to send any images from your home to the cloud at all, ever.
But even if those images are never sent to the cloud, a user may be concerned that those images are still captured and processed within your device in your home, even if temporarily. There is also a security concern that someone with malicious intent could try to get that data from your device. That is why it is vital that the ideal smart home focused AI solution also ensures to capture and process that content in a secure way.
As an example, what is required is a combination of chip architecture and tools that were built from the ground up to process content in a secure way. This is not new for Synaptics as this started many years back. In Synaptics’ streaming media SoCs, that have been used in products like STBs and OTT dongles, the SoC secures some of the world’s most valuable streaming media content provided by the largest names in the industry such as Netflix, Disney, and many others. That same robust, mature SoC architecture used to dissuade hackers from pirating Money Heist from your streaming media dongle, is now also used to discourage hackers from trying to capture images of the front of your house.
This new era of IoT will be facilitated by more “local intelligence” — Edge AI — that lessens the need for (and risk of) being always connected. AI-driven neural networks, processed at the edge device, hold the key to accelerating the adoption of perceptive intelligence systems. By being able to implement this at the edge, the systems operate with greater security and privacy, and lower latency. High-performance, multi-processor SoCs that can support multi-modal interface solutions — and are available at consumer market price points — will help developers leverage AI innovations quickly and differentiate their products.
Enhancements in how machines can utilize voice, video and visual data and use it to understand and predictively respond to what we do, say, or touch promise to revolutionize how IoT can deliver unprecedented levels of safety, convenience, and productivity in our lives.
— Vineet Ganju is vice president of IOT Edge AI at Synaptics