Modelling speaker intelligibility in noise

Speech Communication

This study compared behavioural performance on a multispeaker speech-in-noisetask with that of a model inspired by automatic speech recognition techniques.Listeners identi ed 3 keywords in simple 6-word sentences in speech-shaped noisespoken by one of 18 male or 16 female speakers. An across-speaker analysis of anumber of acoustic parameters (vocal tract length, mean fundamental frequencyand speaking rate) found none to be consistently good predictors of relative intelligibility.A simple measure of degree of energetic masking was a good predictor offemale speech intelligibility, especially in high noise conditions, but failed to accountfor interspeaker di erences for the male group. A glimpsing model, which combineda simulation of energetic masking with speaker-dependent statistical models, producedrecognition scores which were tted to the behavioural data pooled acrossall speakers. Using a single set of speaker-independent, noise-level-indepedent parameters,the model was able to predict not only the intelligibility of individualspeakers to a remarkable degree, but could also account for most of the token-wiseintelligibilities of the letter keywords. The t was particularly good in high noiseconditions.