Sequential Data Mining for Information Extraction from Texts

نویسندگان

  • Thierry Charnois
  • Marc Plantevit
  • Christophe Rigotti
  • Bruno Crémilleux
چکیده

This paper shows the benefit of using data mining methods for Biological Natural Language Processing. A method for discovering linguistic patterns based on a recursive sequential pattern mining is proposed. It does not require a sentence parsing nor other resource except a training data set. It produces understandable results and we show its interest in the extraction of relations between named entities. For the named entities recognition problem, we propose a method based on a new kind of patterns taking account the sequence and its context. MOTS-CLÉS : extraction d’information, fouille de données, motifs séquentiels et motifs LSR, TAL appliqué aux textes biologiques et génétiques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A New GIS based Application of Sequential Technique to Prospect Karstic Groundwater using Remotely Sensed and Geoelectrical Methods in Karstified Tepal Area, Shahrood, Iran

In this research, recognition of karstic water-bearing zones using the management of exploration data in Kal-Qorno valley, situated in the Tepal area of Shahrood, has been considered. For this purpose, the sequential exploration method was conducted using geological evidences and applying remote sensing and geoelectrical resistivity methods in two major phases including the regional and local s...

متن کامل

dRAP-Independent: A Data Distribution Algorithm for Mining First-Order Frequent Patterns

In this paper we present dRAP-Independent, an algorithm for independent distributed mining of first-order frequent patterns. This system is based on RAP, an algorithm for finding maximal frequent patterns in first-order logic. dRAPIndependent utilizes a modified data partitioning schema introduced by Savasere et al. and offers good performance and low communication overhead. We analyze the perf...

متن کامل

Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine. (Extraction of arguments in N-ary relations in texts guided by a domain OTR)

Today, a huge amount of data is made available to the research community through several web-based libraries. Enhancing data collected from scientific documents is a major challenge in order to analyze and reuse efficiently domain knowledge. To be enhanced, data need to be extracted from documents and structured in a common representation using a controlled vocabulary as in ontologies. Our rese...

متن کامل

A Mutually Beneficial Integration of Data Mining and Information Extraction

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TAL

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2009