Reduce non-stationary noise in handsets
Until recently, noise suppression technology in handsets focused on reducing slow-changing stationary noise sources. However, non-stationary noise sources are fast-changing and are not suppressed. Hence, subscribers cannot reliably use their handsets on busy streets, in crowded restaurants or even at home.
Suppressing non-stationary noise brings substantial benefits to both subscribers and carriers. Users gain the freedom to speak and hear clearly wherever and whenever they want, enjoy increased privacy by being able to speak softly in noisy environments, and won't be asked to leave important conference calls. Carriers see a reduction in customer churn, increased airtime usage, more efficient use of network bandwidth, and significant savings of capital and operational expenses.
This article outlines nine steps to reduce non-stationary noise in handsets.
Understand the difference between stationary and non-stationary noise—Because of its relatively constant nature—such as a loud fan in the background—stationary noise is readily recognized and effectively subtracted through conventional signal processing techniques. Non-stationary noise, however, is characterized by rapid or random change, such as a person talking, background music or keyboard typing. When non-stationary noise is recognized as noise, it has already passed, and so more sophisticated noise suppression techniques are required.
Use two microphones to improve understanding of the auditory scene—Next-generation noise suppression techniques such as auditory scene analysis (ASA), beam forming and blind source separation (BSS) use multiple microphones to more accurately identify, locate and group noise sources than is possible with a single microphone. Handset makers recognized this trend and have begun to introduce a second microphone into handset architectures.
Use grouping principles to separate voice of interest—Grouping technologies simplify noise suppression while making it possible to identify non-stationary noise sources. ASA, for example, uses the human auditory pathway as a model and processes noise the way people listen to specific sounds. By grouping acoustic energy to recreate the original sound, ASA enables accurate grouping of sounds from multiple sources while avoiding any blending of sounds that should be perceived as separate. Grouping principles can be broadly described as sequential (those that operate across time) and simultaneous (those that operate across frequency).
Use multiple cues to group otherwise difficult-to-group sounds correctly—Each grouping cue has limitations. Using multiple cues enables otherwise difficult-to-analyze sounds to be grouped correctly. Some important cues include pitch, spatial information and onset time.
Harmonics generated from a pitched sound source form distinct frequency patterns and so can be used to distinguish one sound from another. Pitch is a primary cue for distinguishing between male and female voices.
The spatial information—the location of a sound based on its distance and direction—can be used to group sounds and so differentiate them from the voice of interest.
When two bursts of sound energy and their corresponding harmonics have the same onset time, they are likely from the same source.
Reduce convergence time for more instantaneous noise removal—Traditional noise suppression techniques must first converge before they can remove noise, making them ineffective in suppressing non-stationary noise sources. By using fast-acting cues to characterize sound, even instantaneous events such as a finger snap can be identified and removed.
Employ logarithmic vs. linear frequency scales—The familiar FFT decomposes frequency components on a linear scale that limits spectral resolution at low frequencies and uses a constant frame size and frequency-independent bandwidth. In contrast, an approach such as the Fast Cochlea Transform (FCT) based on characteristics of the human cochlea operates on a logarithmic frequency scale. As a result, it does not limit spectral resolution. By operating continuously instead of in frames, the FCT also reduces processing latency, making it appropriate for identifying non-stationary noise sources. The FCT also operates with frequency-dependent bandwidth and so can more precisely match the time-frequency trade-off at each frequency of the human hearing range.
Noise suppression techniques reduce noise levels in handsets by up to 25dB.
Use omni-directional microphones to cut cost—Techniques such as beam forming require a specialized cardioid (unidirectional) microphone. Cardioid microphones cost more than omni-directional microphones, have tighter tolerances, must be individually calibrated and matched to within 1dB, introduce restrictions on spacing, and add up to +12dB noise because of sensitivity to wind and breath. Beam forming is also limited in that any distractors in the beam of interest will be incorrectly passed through as being part of the voice of interest.
It is also important to manage the number of microphones a system requires. For example, BSS uses a simple linear unmixing technique, which runs optimally in the presence of at least as many microphones as there are sound sources.
Treat echoes as independent sound sources—Traditionally, echoes are removed using separate echo cancellation techniques. Such techniques can be compute-intensive as they must calculate echo reflections and offer poor performance in the presence of rapidly changing noise sources. Grouping cues enable echoes to be treated as simply another noise source. Instantaneous suppression becomes possible because echoes neither need to be calculated nor their changes tracked, providing echo suppression performance up to 46dB.
Adopt new testing standards—The mobile industry continues to drive test standards to reflect higher levels of voice quality through innovations in noise suppression. To ensure the best quality for their products, the industry has recently amended the ITU P.835 specification to provide a consistent test methodology for measuring and reporting voice quality with noise suppression technology active.
Effective suppression of environmental noise, both stationary and non-stationary, is essential if handset manufacturers and carriers are to keep pace with their competitors. By employing next-generation noise suppression techniques developers can reduce noise levels in handsets by up to 25dB under a wide range of operating conditions.
- Lloyd Watts
Founder, Chairman and Chief Technology Officer