MVP: Multimodal emotion recognition based on video and physiological signals

Strizhkova, Valeriya; Kachmar, Hadi; Chaptoukaev, Hava; Kalandadze, Raphael; Kukhilava, Natia; Tsmindashvili, Tatia; Abo-Alzahab, Nibras; Zuluaga, Maria A.; Balazia, Michal; Dantcheva, Antitza; Brémond, François; Ferrari, Laura M.

ABAW 2024, 7th Workshop and Competition on Affective Behavior Analysis in-the-wild, in conjunction with ECCV 2024, 18th European Conference on Computer Vision, 29 September-4 October 2024, Milano, Italy

Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG. The code will be available on GitHub.

Detail

Document

ARXIV

HAL

BIBTEX

Type:

Conference

City:

Milano

Date:

2024-09-29

Department:

Data Science

Eurecom Ref:

7878

© Springer. Personal use of this material is permitted. The definitive version of this paper was published in ABAW 2024, 7th Workshop and Competition on Affective Behavior Analysis in-the-wild, in conjunction with ECCV 2024, 18th European Conference on Computer Vision, 29 September-4 October 2024, Milano, Italy and is available at :