HYENA: Hierarchical Type Classification for Entity Names
HYENA is a multi-label classifier for entity types based on hierarchical taxonomies derived from YAGO2 knowledge base.
HYENA types taxonomy is composed of 505 types organized into a directed acyclic graph with 5 main super types in its top level, and 9 levels in its deepts part. HYENA was trained on 1.6 million instances extracted from 50,000 randomly selected Wikipedia articles.
HYENA uses neighboring words and bigrams, part-of-speech tags, and also phrases from a large gazetteer derived from YAGO2 knowledge base.
Publications
- HYENA: Hierarchical Type Classification for Entity Names PDF
Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum
In: Proceedings of the 24th International Conference on Computational Linguistics, Coling 2012, Mumbai, India, 2012
For scientific works, please cite this paper - HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text PDF
Mohamed Amir Yosef, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum
In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, 2013
HYENA Fine-grained Type Hierarchy
HYENA type taxonomy was derived from YAGO knolwedge base by starting with five broaad classes namely PERSON, LOCATION, ORGANIZATION, EVENT and ARTIFACT. Under each of these superclasses, the most 100 prominent subclasses are picked based on the population of the classes. Classes are organized in a hierachy which has 9 levels in its deepest parts. <br/> You can browse our hierarchy in the pdf file below or using our Interactive Browser.
Properities of the dataset used to train and evaluate HYENA
data property | training | testing |
---|---|---|
# of articles | 50,000 | 10,000 |
# of instances (all types) | 1,613,340 | 253,029 |
# of location instances | 489,003 (30%) | 86,936 (34.4%) |
# of person instances | 426,467 (26.4%) | 62,446 (24.6%) |
# of organization instances | 219,716 (13.6%) | 38,293 (15.1%) |
# of artifact instances | 204,802 (12.7%) | 31,899 (12.6%) |
# of event instances | 176,549 (10.9%) | 28,952 (11.4%) |
# instances in 1 top-level class | 1,131,994 (70.2%) | 179,240 (70.8%) |
# instances in 2 top-level classes | 182,508 (11.3%) | 33,399 (13.2%) |
# instances in more than 2 top-level classes | 6,492 (0.4%) | 828 (0.3%) |
# instances not in any class | 292,346 (18.1%) | 39,562 (15.6%) |
Results
In the Coling 2012 paper, HYENA has been tested on 253,029 instances from 10,000 randomly selected Wikipedia articles. The macro per class, and micro results are shown in the table below.
Macro | Micro | |||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
HYENA | 0.878 | 0.863 | 0.87 | 0.913 | 0.932 | 0.922 |
HYENA + meta-classifier | 0.89 | 0.837 | 0.862 | 0.916 | 0.914 | 0.915 |
Detailed HYENA results for each type classifier, as well as the output for each testing instance are available here.
Results are downloadable as one compressed archive here.