BUNNI: Learning repair actions in rule-driven data cleaning

Mecca, Giansalvatore; Papotti, Paolo; Santoro, Donatello; Veltri, Enzo
Journal of Data and Information Quality, May 2024

In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as first-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staff and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user effort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.


DOI
Type:
Journal
Date:
2024-05-25
Department:
Data Science
Eurecom Ref:
7737
Copyright:
© ACM, 2024. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Journal of Data and Information Quality, May 2024 https://doi.org/10.1145/3665930
See also:

PERMALINK : https://www.eurecom.fr/publication/7737