A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers

Yan Tang, Martin Cooke, Bruno Fazenda, Trevor Cox.
Journal of the Acoustical Society of America

One criterion in the design and evaluation of binaural sound scenes is an assessment of how much of the intended message is correctly understood by listeners. While advances have been made in quantifying the intelligibility of single channel speech in noise, relatively little work has been done on objective intelligibility metrics for binaural listening conditions. Motivated by better-ear glimpsing and binaural masking level differences, the current study describes and evaluates a binaural distortion-weighted glimpse proportion metric, BiDWGP that operates with either binaural signals or single channel signals from each sound source along with their locations in a horizontal plane. Two listening experiments were performed with stationary noise and competing speech, one in the presence of a single masker, the other with multiple maskers, for a variety of spatial configurations of target speech and maskers. The BiDWGP metric predicts listener keyword scores with correlations of 0.95 and 0.91 respectively for stationary and fluctuating maskers. In multiple-masker conditions, when the two types of masker are considered separately, correlations rise to 0.98 for both types of masker. Predictions for binaural and single channel inputs are very similar, suggesting that the BiDWGP metric can be applied to the design of sound scenes where individual sound sources and their locations are available.