What's up LOD cloud? Observing the state of linked open data cloud metadata

Assaf, Ahmad; Troncy, Raphaël; Senart, Aline
LDQ 2015, 2nd workshop on Linked Data Quality, Main conference ESWC 2015, June 1st 2015, Portoroz, Slovenia / Also published in LNCS, Volume 9341/2015

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access descriptive information about each dataset (or metadata). However, the heterogeneous nature of data sources reflects directly on the data quality as these sources often contain
inconsistent as well as misinterpreted and incomplete metadata information. Considering the significant variation in size, the languages used and the freshness of the data, one realizes that finding useful datasets without prior knowledge is increasingly complicated.We have developed Roomba, a tool that enables to validate, correct and generate dataset metadata. In this paper, we present the results of running this tool on parts of the LOD cloud accessible via the datahub.io API. The results demonstrate
that the general state of the datasets needs more attention as most of them suffers from bad quality metadata and lacking some informative metrics that are needed to facilitate dataset search. We also show that the automatic corrections done by Roomba increase the overall quality of the datasets metadata and we highlight the need for manual efforts to
correct some important missing information.

Data Science
and is available at : http://dx.doi.org/10.1007/978-3-319-25639-9_40

PERMALINK : https://www.eurecom.fr/publication/4597