Robust automatic speech recognition with missing and uncertain acoustic data

Martin Cooke, Phil Green, Ljubomir Josifovski, Ascencion Vizinho.
Speech Communication

Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally-occurring. A novel approach to dealing with these situations in robust ASR is to treat certain spectro-temporal regions as missing, or uncertain. The primary advantage of this viewpoint is that it makes minimal assumptions about any noise background. Instead, the reliable regions must first be identified, followed by decoding on the basis of the available evidence. This paper demonstrates how conventional continuous-density hidden Markov model based speech recognisers can be adapted to deal with missing and uncertain acoustic data. Two different approaches are developed. The first computes output probabilities on the basis of the reliable evidence only, and further shows how the minimal assumption of energy bounds can be incorporated into this framework. The second technique estimates values for the unreliable regions by conditioning on the reliable parts. Both approaches are evaluated on the TIDigits corpus for several noise conditions, using spectral subtraction as a means to identify reliable regions. These studies demonstrate that the two approaches behave at a similar level, and that both produce a significant performance advantage over spectral subtraction alone. Potential applications of the missing data approach are discussed.