Information Extraction from Multiple Syntactic Sources

نویسنده

  • Shubin Zhao
چکیده

Information Extraction is the automatic extraction of facts from text, which includes detection of named entities, entity relations and events. Conventional approaches to Information Extraction try to find syntactic patterns based on deep processing of text, such as partial or full parsing. The problem these solutions have to face is that as deeper analysis is used, the accuracy of the result decreases, and one cannot recover from the induced errors. On the other hand, lower level processing is more accurate and it can also provide useful information. However, within the framework of conventional approaches, this kind of information can not be efficiently incorporated. This thesis describes a novel supervised approach based on kernel methods to address these issues. In this approach customized kernels are used to match syntactic structures produced from different preprocessing phases. Using properties of a kernel, individual kernels are combined into a composite kernel to integrate and extend all the information. The composite kernels can be used with various classifiers, such as Nearest Neighbor or Support Vector Machines (SVM). The main classifier we propose to use is SVM due to its ability to generalize in large dimensional feature spaces. We will show that each level of syntactic information can contribute to IE tasks, and low level information can help to recover from errors in deep processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles∗

This paper proposes a ternary relation extraction method primarily based on rich syntactic information. We identify PROTEIN-ORGANISM-LOCATION relations in the text of biomedical articles. Different kernel functions are used with an SVM learner to integrate two sources of information from syntactic parse trees: (i) a large number of syntactic features that have been shown useful for Semantic Rol...

متن کامل

Exploiting Rich Syntactic Information for Relationship Extraction from Biomedical Articles

This paper proposes a ternary relation extraction method primarily based on rich syntactic information. We identify PROTEIN-ORGANISM-LOCATION relations in the text of biomedical articles. Different kernel functions are used with an SVM learner to integrate two sources of information from syntactic parse trees: (i) a large number of syntactic features that have been shown useful for Semantic Rol...

متن کامل

Combining Multiple Layers of Syntactic Information for Protein-Protein Interaction Extraction

Protein-protein interaction extraction is a challenging information extraction task in the BioNLP field. Several kernels focusing on a part of syntactic information have been proposed for the task. In this paper, we propose a method to combine multiple layers of syntactic information by using a combination of multiple kernels based on several different parsers. We evaluated the method using sup...

متن کامل

abstracts

contents The systematic extraction model of the knowledge sources and tools from The Holy Quran/ Ali Mowlaei, Mahdi Golshani Critical Analysis of Epistemological Principles of Cartesian Humanism Based on Allameh Mohammad-Taqi Ja'fari's Thoughts/ Narges Aboul-Qasemian, Abdollah Nasri, Fazlollah Khaleghian <st...

متن کامل

Syntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study

A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004