Spanish confusions corpus
We present a large-scale corpus of noise-induced robust misperceptions in Spanish. The corpus contains 3235 consistent misperceptions, selected for the corpus if at least 6 listeners reported the same response from a group of 15 listeners. Each confusion is described in detail in the spreadsheet provided below, including orthographic and semi-phonemic transcriptions of the target word and misperception as well as phoneme alignment distances computed between the transcription using dynamic programming string alignment.
The confusion corpus is made up of the following components:
- A spreadsheet [273 Kb] in csv format with each row describing a misperception.
- Waveforms [263 Mb] corresponding to each misperception, including the speech-plus-masker mixture presented to listeners as well as the speech and the masker waveforms separately
- Continuous masker waveforms [8.5 Mb] from which the individual masker fragments were chosen, together with the transcription files for the masker waveforms composed of natural speech indicating which words are present and when they occur
You can also download the entire corpus [272 Mb] as a zip file.