Publications

Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Authors
Jon Barker, Ning Ma, Andre Coy, Martin Cooke.
Year
2010
Journal
Computer Speech and Language
DOI
ISSN

This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speechfragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are usedto segment a spectro-temporal representation into a set of fragments, such that each fragment is dominated by one or otherof the speech sources. A speech fragment decoder is used which employs missing data techniques and clean speech modelsto simultaneously search for the set of fragments and the word sequence that best matches the target speaker model. Thepaper investigates the performance of the system on a recognition task employing artificially mixed target and maskerspeech utterances. The fragment decoder produces significantly lower error rates than a conventional recogniser, and mimicsthe pattern of human performance that is produced by the interplay between energetic and informational masking.However, at around 0 dB the performance is generally quite poor. An analysis of the errors shows that a large numberof target/masker confusions are being made. The paper presents a novel fragment-based speaker identification approachthat allows the target speaker to be reliably identified across a wide range of SNRs. This component is combined with therecognition system to produce significant improvements. When the target and masker utterance have the same gender, therecognition system has a performance at 0 dB equal to that of humans; in other conditions the error rate is roughly twicethe human error rate.