Sponsored by the Pascal network.
The task is to recognise speech from a target talker in the presence of either stationary noise or other speech. You can access training data and two test sets here. One test set contains sentences spoken in speech-shaped noise at a number of SNRs (signal-to-noise ratios) ranging from clean to -12 dB. The other consists of pairs of sentences at a range of TMRs (target-to-masker ratios) from 6 to -9 dB. Only one signal per mixture is provided (i.e. the task is "single microphone").
Speech material is drawn from the GRID corpus  which consists of sentences which are simple sequences of the form
e.g. "place white at L 3 now"
(the numbers in brackets indicate the number of choices at each point).
Although the task is not particularly representative of everyday speech, it was chosen for the speech separation challenge because
The training and development sets are drawn from a closed set of 34 talkers of both genders. The training and development data are available as zip files, split to allow ease of downloading.
Note: Although utterances were semi-automatically screened (details in ), we are aware of a very small number of errors in the training data sets (estimated at < 0.1 %) where the sentence does not match the name (usually due to speaker error, and usually just in the letter component), or where the recording was truncated. Check this list of corrections.
An easy to use HMM-based recogniser is now available for those of you whose algorithms produce 'enhanced' waveforms. The entire process, from waves to scores, has been automated. We have provided a scoring script which is available as part of the recogniser package. You should use this even if you plan to use your own recogniser. The outputs of the scoring script provide the minimal set of results to report in your paper. You can also use the scoring scripts during development. This document describes both the scoring scripts and the recogniser.
The following authors have kindly agreed to have their Interspeech submissions made available (note that these articles should not be cited without the express permission of the authors):
Acknowledgements: Jon Barker helped to construct the stimuli and two-talker task. Ning Ma and Youyi Lu helped develop the recogniser and associated scoring scripts.
 Cooke, M. P., Barker, J., Cunningham, S. P. and Shao, X. (2006) An audio-visual corpus for speech perception and automatic speech recognition, Journal of the Acoustical Society of America, 120: 2421-2424.
 Barker, J. and Cooke, M.P. Modelling speaker intelligibility in noise, accepted for Speech Communication
Last updated: 18 May 2015