Automatic Specialized vs. Non-specialized Sentence Differentiation

نویسندگان

  • Iria da Cunha
  • M. Teresa Cabré
  • Eric SanJuan
  • Gerardo Sierra
  • Juan-Manuel Torres-Moreno
  • Jorge Vivaldi
چکیده

Compilation of Languages for Specific Purposes (LSP) corpora is a task which is fraught with several difficulties (mainly time and human effort), because it is not easy to discern between specialized and non-specialized text. The aim of this work is to study automatic specialized vs. non-specialized sentence differentiation. The experiments are carried out on two corpora of sentences extracted from specialized and non-specialized texts. One in economics (academic publications and news from newspapers), another about sexuality (academic publications and texts from forums and blogs). First we show the feasibility of the task using a statistical n-gram classifier. Then we show that grammatical features can also be used to classify sentences from the first corpus. For such purpose we use association rule mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Summarization Using Terminological and Semantic Resources

In this paper we present a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those...

متن کامل

Automatic Differentiation: Point and Interval Taylor Operators

Frequently of use in optimization problems, automatic diierentiation may be used to generate Taylor coeecients. Specialized software tools generate Taylor series approximations, one term at a time, more eeciently than the general AD software used to compute (partial) derivatives. Through the use of operator overloading, these tools provide a relatively easy-to-use interface that minimizes the c...

متن کامل

Semiautomatic Differentiation for Efficient Gradient Computations

Many large-scale computations involve a mesh and first (or sometimes higher) partial derivatives of functions of mesh elements. In principle, automatic differentiation (AD) can provide the requisite partials more efficiently and accurately than conventional finite-difference approximations. AD requires source-code modifications, which may be little more than changes to declarations. Such simple...

متن کامل

Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations

This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by invest...

متن کامل

Sentence Processing by Ear and by Eye 1 Running head: Sentence Processing by Ear and by Eye Sentence Processing by Ear and by Eye: homing in on the heteromodal language brain

fMRI was used to examine the brain activity of 18 young adults while they read or heard matched sentences that were either well-formed or contained anomalies of syntactic form or pragmatic content. We examined specific regions of interest (ROI) based on activity patterns found in earlier research. We sought: a) brain regions that respond robustly to both speech and print; b) regions that are se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011