A glimpsing model of speech perception in noise


Abstract  Do listeners process noisy speech by taking advantage of "glimpses"—spectrotemporal regions in which the target signal is least affected by the background? This study used an automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise. Twelve masking conditions were chosen to create a range of glimpse sizes. Several different glimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used for detection, the minimum glimpse size, and the use of information in the masked regions. Recognition results were compared with behavioral data.Aquantitative analysis demonstrated that the proportion of the time–frequency plane glimpsed is a good predictor of intelligibility. Recognition scores in each noise condition confirmed that sufficient information exists in glimpses to support consonant identification. Close fits to listeners’ performance were obtained at two local SNR thresholds: one at around 8 dB and another in the range −5 to −2 dB. A transmitted information analysis revealed that cues to voicing are degraded more in the model than in human auditory processing.

Mail Portal

LOGIN @laslab.org
powered by Google