ADEL: ADaptable Entity Linking : A hybrid approach to link entities with linked data for information extraction

Plu, Julien; Rizzo, Giuseppe; Troncy, Raphaël
Semantic Web Journal (SWJ), Special Issue on Linked Data for Information Extraction, 2017

Four main challenges can cause numerous difficulties when developing an entity linking system: i) the kind of textual documents to annotate (such as social media posts, video subtitles or news articles); ii) the number of types used to categorise an entity (such as Person, Location, Organization, Date or Role); iii) the knowledge base used to disambiguate the extracted mentions (such as DBpedia, Wikidata or Musicbrainz); iv) the language used in the documents. Among these four challenges, being agnostic to the knowledge base and in particular to its coverage, whether it is encyclopedic like DBpedia or domain-specific like Musicbrainz, is arguably the most challenging one. We propose to tackle those four challenges and in order to be knowledge base agnostic, we propose a method that enables to index the data independently of the schema and vocabulary being used. More precisely, we design our index such that each entity has at least two information: a label and a popularity score such as a prior probability or a Pagerank score. This results in a framework named ADEL, an entity recognition and linking system based on a hybrid linguistic, information retrieval, and semantics-based methods. ADEL is a modular framework that is independent to the kind of text to be processed and to the knowledge base used as referent for disambiguating entities. We thoroughly evaluate the framework on six benchmark datasets: OKE2015, OKE2016, NEEL2014, NEEL2015, NEEL2016 and AIDA. Our evaluation shows that ADEL outperforms state-of-the-art systems in terms of extraction and entity typing. It also shows that our indexing approach allows to generate an accurate set of candidates from any knowledge base that makes use of linked data, respecting the required information for each entity, in a minimum of time and with a minimal size.


HAL
Type:
Journal
Date:
2017-12-31
Department:
Data Science
Eurecom Ref:
5616
Copyright:
© IOS Press. Personal use of this material is permitted. The definitive version of this paper was published in Semantic Web Journal (SWJ), Special Issue on Linked Data for Information Extraction, 2017 and is available at :

PERMALINK : https://www.eurecom.fr/publication/5616