Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition

Kuhn, Roland;Nguyen, Patrick;Junqua, Jean-Claude;Goldwasser, L

MMSP 1998, IEEE 2nd Workshop on Multimedia Signal Processing, December 7-9, 1998, Los Angeles, USA

There are hidden analogies between two dissimilar research areas: face recognition and speech recognition. The standard representations for faces and voices misleadingly suggest that they have a high number of degrees of freedom. However, human faces have two eyes, a nose, and a mouth in predictable locations; such constraints ensure that possible images of faces occupy a tiny portion of the space of possible 2D images. Similarly, physical and cultural constraints on
acoustic realizations of words uttered by a particular speaker imply that the true number of degrees of freedom for speaker-dependent hidden Markov models (HMMs) is quite small.
Face recognition researchers have recently adopted representations that make explicit the underlying low dimensionality of the task, greatly improving the performance of their systems while reducing computational costs. We argue that speech researchers should use similar techniques to represent variation between speakers, and discuss applications to speaker adaptation, speaker identification and speaker verification.

Detail

Document

DOI

BIBTEX

Type:

Conférence

City:

Los Angeles

Date:

1998-12-01

Department:

Sécurité numérique

Eurecom Ref:

199

© 1998 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.