On de-emphasizing the spurious components in the spectral modulation for robust speech recognition

Tyagi, Vivek;Wellekens, Christian J
Robust 2004, COST 278 and ISCA Tutorial and Research Workshop Robustness Issues in Conversational Interaction
30-31 August 2004, Norwich, UK

It is well known that the peaks in log Mel-filter bank spectrum essentially represent the ?formants? of the speech signal and are important cues in characterizing the sound. However, the perturbations in the low energy log Mel-filter bank spectrum create unnecessary sensitivity in the cepstral comparison, especially in the presence of the additive noise. In this paper, we present a technique to suppress this unnecessary sensitivity of the log Mel-filter bank spectrum (logMelFBS) of the speech signals, while preserving the fundamental formant structure. From the practical point of view, our technique is quite similar to the spectral root homomorphic deconvolution systems (SRDS) [3]. However, we work with log homomorphic deconvolution system (LHDS) [1] and use an exponentiation of logMelFBS to emphasize the spectral peaks (formants). In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC features, while achieving slightly better performance in clean conditions. The proposed technique yields almost similar performance as compared to the root Mel-cepstral coefficients (RMFCC) in the noisy as well as clean conditions.

Digital Security
Eurecom Ref:
© ISCA. Personal use of this material is permitted. The definitive version of this paper was published in Robust 2004, COST 278 and ISCA Tutorial and Research Workshop Robustness Issues in Conversational Interaction
30-31 August 2004, Norwich, UK and is available at :

PERMALINK : https://www.eurecom.fr/publication/1498