Automatic Specialized vs. Non-specialized Sentence Differentiation
نویسندگان
چکیده
Compilation of Languages for Specific Purposes (LSP) corpora is a task which is fraught with several difficulties (mainly time and human effort), because it is not easy to discern between specialized and non-specialized text. The aim of this work is to study automatic specialized vs. non-specialized sentence differentiation. The experiments are carried out on two corpora of sentences extracted from specialized and non-specialized texts. One in economics (academic publications and news from newspapers), another about sexuality (academic publications and texts from forums and blogs). First we show the feasibility of the task using a statistical n-gram classifier. Then we show that grammatical features can also be used to classify sentences from the first corpus. For such purpose we use association rule mining.
منابع مشابه
Automatic Summarization Using Terminological and Semantic Resources
In this paper we present a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those...
متن کاملAutomatic Differentiation: Point and Interval Taylor Operators
Frequently of use in optimization problems, automatic diierentiation may be used to generate Taylor coeecients. Specialized software tools generate Taylor series approximations, one term at a time, more eeciently than the general AD software used to compute (partial) derivatives. Through the use of operator overloading, these tools provide a relatively easy-to-use interface that minimizes the c...
متن کاملSemiautomatic Differentiation for Efficient Gradient Computations
Many large-scale computations involve a mesh and first (or sometimes higher) partial derivatives of functions of mesh elements. In principle, automatic differentiation (AD) can provide the requisite partials more efficiently and accurately than conventional finite-difference approximations. AD requires source-code modifications, which may be little more than changes to declarations. Such simple...
متن کاملBuilding Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by invest...
متن کاملSentence Processing by Ear and by Eye 1 Running head: Sentence Processing by Ear and by Eye Sentence Processing by Ear and by Eye: homing in on the heteromodal language brain
fMRI was used to examine the brain activity of 18 young adults while they read or heard matched sentences that were either well-formed or contained anomalies of syntactic form or pragmatic content. We examined specific regions of interest (ROI) based on activity patterns found in earlier research. We sought: a) brain regions that respond robustly to both speech and print; b) regions that are se...
متن کامل