Audio: A critical ‘intelligent sensor’ for autonomous cars?

Article By : Junko Yoshida

DSP Concepts CEO Paul Beckmann says audio is “heading from pure playback” in entertainment systems to enabling “input, trigger and analytics in contextual awareness.”

Vision processors, along with advanced driver assistance systems (ADAS) and CMOS image sensors play a vital role in autonomous vehicles, acting as its eyes on the road. But what about what it hears?

Will microphones ever play as important a role as cameras to add “intelligence” to autonomous cars?

Cars and drivers already hear its sirens well before they can spot an approaching ambulance, said Paul Beckmann, founder and CEO at DSP Concepts in a recent interview with EE Times. Why wouldn’t the automotive industry be interested in audio?

System OEMs — not limited to carmakers — are at the cusp of “using more microphones to generate yet another critical sensory data — audio — for artificial intelligence,” Beckmann explained.

As he envisions it, audio is “heading from pure playback” in entertainment systems to enabling “input, trigger and analytics in contextual awareness.”

The intelligence picked up by microphones can be used by every-day systems ranging from cars to digital virtual assistants and portable devices. “Sight and hearing go hand in hand,” added Willard Tu, DSP Concepts' executive vice president of sales and marketing. “Dogs barking, babies crying, glasses shuttering, cars honking, sirens wailing, gunshot noise…audio helps systems understand the environment [and the context] better.”

[audio-roadmap 421]
__Figure 1:__ *Audio “input” Algorithm Roadmap (Source: DSP Concepts)*

Two developments drive the electronics industry’s sudden exuberance for audio.

One is the proliferation of smartphones with multiple microphones per handset. Second is the popularity of digital virtual assistants like Amazon’s Echo and Google Home. Peter Cooney, principal analyst and director of SAR Insight & Consulting, observed “the increasing integration of virtual digital assistants into common consumer devices is driving awareness and adoption of voice as a natural user interface for many everyday tasks.”

But as to how soon microphones can go beyond offering a natural user interface, and start becoming a genuinely “intelligent sensor,” the industry still waits for a few advances.

Kickstart your IoT project with a
[Raspberry Pi 3 starter kit]
Figure 2: (Source: SAR Insights & Consulting)

DSP Concepts doesn’t design or sell DSPs. Yet, competitors are generally other DSP outfits. Audio Weaver competes with audio tools internally created by DSP suppliers such as Texas Instruments or Cirrus Logic. The difference is that those internally developed tools only work on their own chips. In using a platform-independent tool like that of Audio Weaver, “OEMs don’t have to get locked into a specific DSP,” added DSP Concepts’ Tu.

Cooney said that DSP Concepts, by partnering with a number of other companies like Cadence/Tensilica, is in the business of offering audio design solutions to their customers.

In addition to Audio Weaver tools, DSP Concepts licenses a host of audio algorithms that shape microphone input, including beamforming, echo cancellation, noise cancellation and far-field sound. At a time when the industry suffers from a lack of engineering talents well versed in audio processing, the market is clamoring for easy-to-use tools and audio pre-processing algorithms that can isolate sound from unwanted environmental noise, explained Beckmann.

Audio: a stepchild to video
At present, however, using audio for acoustic event detection (and analytics) remains a relatively new practice.

TECHnalysis Research’ O’Donnell told EE Times: “In theory, there could be more dedicated audio processors that do AI, but frankly, audio has always been a stepchild to video and that continues today.”

He added that another big challenge for audio is “language and meaning.” He said, “A picture of a tree is a tree in any language, but understanding words, phrases and, most importantly, meaning and intent are both language- and cultural-specific.” This makes voice recognition and natural language processing very difficult, he added.

Continue reading on EE Times Asia's sister site, EE Times.

Leave a comment