Mining Paraphrasal Typed Templates from a Plain Text Corpus
نویسندگان
چکیده
Finding paraphrases in text is an important task with implications for generation, summarization and question answering, among other applications. Of particular interest to those applications is the specific formulation of the task where the paraphrases are templated, which provides an easy way to lexicalize one message in multiple ways by simply plugging in the relevant entities. Previous work has focused on mining paraphrases from parallel and comparable corpora, or mining very short sub-sentence synonyms and paraphrases. In this paper we present an approach which combines distributional and KB-driven methods to allow robust mining of sentence-level paraphrasal templates, utilizing a rich type system for the slots, from a plain text corpus.
منابع مشابه
A language-independent method for the extraction of RDF verbalization templates
With the rise of the Semantic Web more and more data become available encoded using the Semantic Web standard RDF. RDF is faced towards machines: designed to be easily processable by machines it is difficult to be understood by casual users. Transforming RDF data into human-comprehensible text would facilitate non-experts to assess this information. In this paper we present a languageindependen...
متن کاملProof Mining with Dependent Types
Several approaches exist to data-mining big corpora of formal proofs. Some of these approaches are based on statistical machine learning, and some – on theory exploration. However, most are developed for either untyped or simply-typed theorem provers. In this paper, we present a method that combines statistical data mining and theory exploration in order to analyse and automate proofs in depend...
متن کاملText mining: intermediate forms on knowledge representation
In this paper we review the main intermediate forms proposed in text mining, and we briefly study some fuzzy counterparts. The concept of intermediate form applies to any knowledge representation employed to represent in a structured way the semantic content of a text corpus. Intermediate forms play a central role in the text mining process since it is necessary to transform plain text into a f...
متن کاملTemplate Induction over Unstructured Email Corpora
Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a techni...
متن کاملAkshaya: A Framework for Mining General Knowledge Semantics From Unstructured Text
We report a tool called Akshaya, which implements a framework to mine four types of “general knowledge semantics” (analytical semantics) from unstructured text. The semantics being mined are semantic siblings, topical anchors, topic expansion and topical markers. The framework provides options to embed more such general knowledge semantic mining algorithms into it. We use a term co-occurrence g...
متن کامل