Automatic extraction of facts, relations, and entities for web-scale knowledge base population
نویسنده
چکیده
Equipping machines with knowledge, through the construction of machinereadable knowledge bases, presents a key asset for semantic search, machine translation, question answering, and other formidable challenges in artificial intelligence. However, human knowledge predominantly resides in books and other natural language text forms. This means that knowledge bases must be extracted and synthesized from natural language text. When the source of text is the Web, extraction methods must cope with ambiguity, noise, scale, and updates. The goal of this dissertation is to develop knowledge base population methods that address the afore mentioned characteristics of Web text. The dissertation makes three contributions. The first contribution is a method for mining high-quality facts at scale, through distributed constraint reasoning and a pattern representation model that is robust against noisy patterns. The second contribution is a method for mining a large comprehensive collection of relation types beyond those commonly found in existing knowledge bases. The third contribution is a method for extracting facts from dynamic Web sources such as news articles and social media where one of the key challenges is the constant emergence of new entities. All methods have been evaluated through experiments involving Web-scale text collections.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAutomatic Construction of a Semantic, Domain-Independent Knowledge Base
In this paper, we want to show which difficulties arise when automatically constructing a domain-independent knowledge base from the web. We show possible applications for such a knowledge base to emphasize its importance. Current knowledge bases often use manuallybuilt patterns for extraction and quality assurance which does not scale well. Our contribution to the community will be a technique...
متن کاملLarge-Scale Knowledge Graph Identification using PSL Extended Abstract
The web is a vast repository of knowledge, but automatically extracting that knowledge, at scale, has proven to be a formidable challenge. A number of recent evaluation efforts have focused on automatic knowledge base population (Ji, Grishman, and Dang 2011; Artiles and Mayfield 2012), and many well-known broad domain and open information extraction systems exist, including the Never-Ending Lan...
متن کاملDeriving a Web-Scale Common Sense Fact Knowledge Base
The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. Extracting such struc...
متن کاملSemantic Web Technologies for a Knowledge Base of Biomedical Facts Extracted from Scientific Literature
Biomedical literature, including scientific articles, public health reports and books become more and more available due to massive digitalization. Exploration and analysis of this rich source of data requires assistance of automatic tools capable of dealing with large volumes of text. We are developing a pipeline for processing publicly available biomedical text, abstracts, full text articles,...
متن کامل