Speech database and protocol validation using waveform entropy

Lapidot, Itshak; Delgado, Héctor; Todisco, Massimiliano; Evans, Nicholas; Bonastre, Jean-Francois
INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, September 2-6, 2018, Hyderabad, India

The assessment of performance for any number of speech processing tasks calls for the use of a suitably large, representative dataset. Dataset design is crucial so as to ensure that any significant variation unrelated to the task in hand is adequately normalised or marginalised. Most datasets are partitioned into training, development and evaluation subsets. Depending on the task, the nature of these three subsets should normally be close to identical. With speech signals being subject to a multitude of different influences, e.g. speaker gender and age, language, dialect, utterance length, etc., the design and validation of speech datasets can become especially challenging. Even if many sources of variation unrelated to the task in hand can easily be marginalised, other sources of more subtle variation can easily be overlooked. Imbalances between training, development and evaluation partitions, can bring into question findings derived from their use. Stringent dataset validation procedures are required. This paper reports a particularly straightforward approach to dataset validation that is based upon waveform entropy.


DOI
Type:
Conference
City:
Hyderabad
Date:
2018-09-02
Department:
Digital Security
Eurecom Ref:
5574
Copyright:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, September 2-6, 2018, Hyderabad, India and is available at : http://dx.doi.org/10.21437/Interspeech.2018-2330

PERMALINK : https://www.eurecom.fr/publication/5574