Harvesting, Searching, and Ranking Knowledge from the Web

The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation.
We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.
Several interlinked sub-projects are growing on the YAGO-NAGA basis. Our vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge. Our methodologies combine concepts, models, and algorithms from several fields, including database systems, information retrieval, statistical learning, and logical reasoning.
Selected Publications
- Johannes Hoffart, Fabian Suchanek, Klaus Berberich, Gerhard Weikum
YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia (pdf)
Special issue of the Artificial Intelligence Journal, 2012 - Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Edwin Lewis Kelham, Gerard de Melo, and Gerhard Weikum
YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages (pdf)
Demo paper in the proceedings of the 20th International World Wide Web Conference (WWW 2011)
Hyderabad, India, 2011 - Ndapandula Nakashole, Martin Theobald and Gerhard Weikum
"Scalable Knowledge Harvesting with High Precision and High Recall" (pdf)
4th ACM International Conference on Web Search and Data Mining(WSDM 2011) - Martin Theobald and Gerhard Weikum
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
Tutorial at PODS 2012 - Gerhard Weikum, Gjergji Kasneci, Maya Ramanath, Fabian Suchanek
Database and information-retrieval methods for knowledge discovery (PDF)
Commun. ACM 52(4): 56-64 (2009) - Fabian Suchanek, Gjergji Kasneci, Gerhard Weikum
Yago - A Large Ontology from Wikipedia and WordNet (PDF, BIB)
Elsevier Journal of Web Semantics - Gjergji Kasneci, Fabian Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum
NAGA: Searching and Ranking Knowledge (PDF, BIB)
24th IEEE International Conference on Data Engineering (ICDE 2008) - Fabian SuchanekMauro Sozio, Gerhard Weikum
SOFIE: A Self-Organizing Framework for Information Extraction (PDF, BIB)
18th International World Wide Web conference (WWW 2009) - Thomas Neumann, Gerhard Weikum
RDF-3X: a RISC-style engine for RDF
Proc. VLDB Endowment 1:1, p. 647-659, August 2008. - More Publications