Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

A comprehensive survey

By Gerhard Weikum, Luna Dong, Simon Razniewski, Fabian Suchanek

213 pages, 27 images, 677 references

Under review at Foundations and Trends in Databases [ArXiv link]

Abstract: Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

Programming assignments for teaching

We have developed a comprehensive list of programming assignments that can be used in teaching the topic. Further details are on this course website.

ChapterMaterials 
(1)Dataset familiarization (pdf) 
2.1Domain modelling (pdf) (sample solution) 
3Scraping (pdf) 
4Entity typing from Wikipedia first sentence (pdf, files) 
4.6Taxonomy induction (pdf) 
6Relation extraction (pdf, files) 
7OpenIE coding (pdf, files) 
8Rule mining (pdf, file) 

These are based on efforts by Julien Romero, Fabian Suchanek, Cuong Xuan Chu. For evaluation data sets not contained above, please contact Simon Razniewski by email.

Related surveys

Aidan Hogan et al. (2020), "Knowledge Graphs", ArXiv [link]

Ridho Reinanda, Edgar Meij and Maarten de Rijke (2020), "Knowledge Graphs: An Information Retrieval Perspective", Foundations and Trends in Information Retrieval [link]

Heiko Paulheim (2016), "Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods", Semantic Web Journal [link]