Generating Natural Language from Linked Data: Unsupervised template extraction
نویسندگان
چکیده
We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.
منابع مشابه
Unsupervised Parsing for Generating Surface-Based Relation Extraction Patterns
Finding the right features and patterns for identifying relations in natural language is one of the most pressing research questions for relation extraction. In this paper, we compare patterns based on supervised and unsupervised syntactic parsing and present a simple method for extracting surface patterns from a parsed training set. Results show that the use of surfacebased patterns not only i...
متن کاملA Survey of Unsupervised Techniques for Web Data Extraction
World Wide Web contains a large amount of data and to fetch important information from web has become a useful task. There are many web information extraction systems are developed and categorised in manual, supervised, semisupervised and unsupervised techniques. We will study unsupervised techniques and how they differ from each other. Roadrunner uses match algorithm for generating the wrapper...
متن کاملTrinity: Unsupervised Web Data Extraction Using Ternary Trees
ARTICLE INFO Internet presents a huge collection of useful information so extracting information from web document has become research area for which web data extractors are used. This technique works on two or more web documents generated by same sever side template and learns a regular expression that models it and then used it for extracting data from similar documents. The technique introdu...
متن کاملExperiments in Linear Template Combination using Genetic Algorithms
Natural Language Generation systems typically have two parts strategic (" what to say ") and tactical (" how to say "). We present our experiments in building an unsupervised corpusdriven template based tactical NLG system. We consider templates as a sequence of words containing gaps. Our idea is based on the observation that templates are grammatical locally (within their textual span). We ...
متن کاملAn Unsupervised Approach to Domain-Specific Term Extraction
Domain-specific terms provide vital semantic information for many natural language processing (NLP) tasks and applications, but remain a largely untapped resource in the field. In this paper, we propose an unsupervised method to extract domain-specific terms from the Reuters document collection using term frequency and inverse document frequency.
متن کامل