Masked multi-time diffusion for multi-modal generative modeling

Bounoua, Mustapha; Franzese, Giulio; Michiardi, Pietro

NeurIPS 2023, 37th Conference on Neural Information Processing Systems, 11-16 December 2023, New Orleans, USA

Multi-modal data is ubiquitous, and models to learn a joint representation of all

modalities have flourished. However, existing approaches suffer from a coherencequality

tradeoff, where generation quality comes at the expenses of generative

coherence across modalities, and vice versa. To overcome these limitations, we

propose a novel method that uses a set of independently trained, uni-modal, deterministic

autoencoders. Individual latent variables are concatenated and fed to a

masked diffusion model to enable generative modeling. We also introduce a new

multi-time training method to learn the conditional score network for multi-modal

diffusion. Empirically, our methodology substantially outperforms competitors in

both generation quality and coherence.

Detail

Document

BIBTEX

Type:

Conférence

City:

New Orleans

Date:

2023-12-11

Department:

Data Science

Eurecom Ref:

7540

© NIST. Personal use of this material is permitted. The definitive version of this paper was published in NeurIPS 2023, 37th Conference on Neural Information Processing Systems, 11-16 December 2023, New Orleans, USA and is available at :