The 1st Speech Separation Challenge

Principal organiser: Martin Cooke (Ikerbasque and University of The Basque Country, Spain)

Sponsored by the Pascal network.

Latest news

Task

The task is to recognise speech from a target talker in the presence of either stationary noise or other speech. You can access training data and two test sets here. One test set contains sentences spoken in speech-shaped noise at a number of SNRs (signal-to-noise ratios) ranging from clean to -12 dB. The other consists of pairs of sentences at a range of TMRs (target-to-masker ratios) from 6 to -9 dB. Only one signal per mixture is provided (i.e. the task is "single microphone").

Speech material is drawn from the GRID corpus [1] which consists of sentences which are simple sequences of the form

     <command:4><color:4><preposition:4><letter:25><number:10><adverb:4>

    e.g. "place white at L 3 now"

(the numbers in brackets indicate the number of choices at each point).

Although the task is not particularly representative of everyday speech, it was chosen for the speech separation challenge because

Training, development and final test data

The training and development sets are drawn from a closed set of 34 talkers of both genders. The training and development data are available as zip files, split to allow ease of downloading.

Note: Although utterances were semi-automatically screened (details in [1]), we are aware of a very small number of errors in the training data sets (estimated at < 0.1 %) where the sentence does not match the name (usually due to speaker error, and usually just in the letter component), or where the recording was truncated. Check this list of corrections.

Default recogniser and scoring scripts

An easy to use HMM-based recogniser is now available for those of you whose algorithms produce 'enhanced' waveforms. The entire process, from waves to scores, has been automated. We have provided a scoring script which is available as part of the recogniser package. You should use this even if you plan to use your own recogniser. The outputs of the scoring script provide the minimal set of results to report in your paper. You can also use the scoring scripts during development. This document describes both the scoring scripts and the recogniser.

Rules of the challenge

Results so far

The following authors have kindly agreed to have their Interspeech submissions made available (note that these articles should not be cited without the express permission of the authors):

Acknowledgements: Jon Barker helped to construct the stimuli and two-talker task. Ning Ma and Youyi Lu helped develop the recogniser and associated scoring scripts.

[1] Cooke, M. P., Barker, J., Cunningham, S. P. and Shao, X. (2006) An audio-visual corpus for speech perception and automatic speech recognition, Journal of the Acoustical Society of America, 120: 2421-2424.

[2] Barker, J. and Cooke, M.P. Modelling speaker intelligibility in noise, accepted for Speech Communication

Last updated: 18 May 2015