Spanish confusions corpus


We present a large-scale corpus of noise-induced robust misperceptions in Spanish. The corpus contains 3235 consistent misperceptions, selected for the corpus if at least 6 listeners reported the same response from a group of 15 listeners. Each confusion is described in detail in the spreadsheet provided below, including orthographic and semi-phonemic transcriptions of the target word and misperception as well as phoneme alignment distances computed between the transcription using dynamic programming string alignment.

Corpus contents

The confusion corpus is made up of the following components:

You can also download the entire corpus [272 Mb] as a zip file.