A glimpsing model of speech perception in noise

Acoustical Society of America

Do listeners process noisy speech by taking advantage of “glimpses”—spectrotemporal regions inwhich the target signal is least affected by the background? This study used an automatic speechrecognition system, adapted for use with partially specified inputs, to identify consonants in noise.Twelve masking conditions were chosen to create a range of glimpse sizes. Several differentglimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used fordetection, the minimum glimpse size, and the use of information in the masked regions. Recognitionresults were compared with behavioral data.Aquantitative analysis demonstrated that the proportionof the time–frequency plane glimpsed is a good predictor of intelligibility. Recognition scores ineach noise condition confirmed that sufficient information exists in glimpses to support consonantidentification. Close fits to listeners’ performance were obtained at two local SNR thresholds: oneat around 8 dB and another in the range −5 to −2 dB. A transmitted information analysis revealedthat cues to voicing are degraded more in the model than in human auditory processing.