TIME-E2V: Overcoming limitations of E2VID

Adra, Mira; Dugelay, Jean-Luc

AVSS 2024, 20th IEEE International Conference on Advanced Video and Signal-Based Surveillance, 15-16 July 2024, Niagara Falls, Canada

In the field of action recognition, event cameras have marked a breakthrough by capturing motion dynamics be-yond the capability of traditional cameras, thanks to their high temporal sensitivity. However, the asynchronous and sparse nature of event data challenges their use with tradi-tional convolutional neural networks (CNNs). The E2VID model offers a solution by transforming event data into con-tinuous video frames, enabling the use of standard CNNs for event-based data analysis. However, it struggles with accu-rately capturing motion speed variations and pauses, limit-ing its effectiveness in scenarios where temporal dynamics are crucial. In response, we introduce TIME-E2V, which integrates spatial embeddings from E2VID with LSTM-derived temporal embeddings from frame timestamps. This combination is processed by a modified 3D convolutional network (C3D), leveraging its inherent strengths in video analysis. Our proposed approach not only overcomes E2VID’s challenges but also delivers competitive perfor-mance across a wide range of dynamic scenes with the lead-ing action recognition networks for event cameras, includ-ing those based on Spiking Neural Networks.

Detail

Document

DOI

BIBTEX

Type:

Conference

City:

Nigara Falls

Date:

2024-07-15

Department:

Digital Security

Eurecom Ref:

7789

© 2024 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.