System combination or fusion is a popular, successful and
sometimes straightforward means of improving performance in
many fields of statistical pattern classification, including speech
and speaker recognition. Whilst there is significant work in
the literature which aims to improve speaker diarization performance
by combining multiple feature streams, there is little
work which aims to combine the outputs of multiple systems.
This paper reports our first attempts to combine the outputs of
two state-of-the-art speaker diarization systems, namely ICSI's
bottom-up and LIA-EURECOM's top-down systems. We show
that a cluster matching procedure reliably identifies corresponding
speaker clusters in the two system outputs and that, when
they are used in a new realignment and resegmentation stage,
the combination leads to relative improvements of 13% and 7%
DER on independent development and evaluation sets.