Modelling speaker intelligibility in noise


Abstract  This study compared behavioural performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identi ed 3 keywords in simple 6-word sentences in speech-shaped noise spoken by one of 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker di erences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were tted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-indepedent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The t was particularly good in high noise conditions.

Mail Portal

powered by Google