Multimedia Systems, 21 November 2022
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
Type:
Journal
Date:
2022-11-21
Department:
Data Science
Eurecom Ref:
7139
Copyright:
© Springer. Personal use of this material is permitted. The definitive version of this paper was published in Multimedia Systems, 21 November 2022 and is available at : https://doi.org/10.1007/s00530-022-01025-2
See also: