paper based texts

Adding Syntax to Dynamic Programming for Aligning Comparable Texts for the Generation of Paraphrases

2006

Siwei Shen Dragomir R. Radev Agam Patel Günes Erkan

Multiple sequence alignment techniques have recently gained popularity in the Natural Language community, especially for tasks such as machine translation, text generation, and paraphrase identification. Prior work falls into two categories, depending on the type of input used: (a) parallel corpora (e.g., multiple translations of the same text) or (b) comparable texts (non-parallel but on the s...

متن کامل

Detecting Satire in Italian Political Commentaries

2016

Rodolfo Delmonte Michele Stingo

This paper presents computational work to detect satire/sarcasm in long commentaries on Italian politics. It uses the lexica extracted from the manual annotation based on Appraisal Theory, of some 30K word texts. The underlying hypothesis is that using this framework it is possible to precisely pinpoint ironic content through the deep semantic analysis of evaluative judgement and appreciation. ...

متن کامل

Domain Adaptation for Dependency Parsing via Self-Training

2015

Juntao Yu Mohab Elkaref Bernd Bohnet

This paper presents a successful approach for domain adaptation of a dependency parser via self-training. We improve parsing accuracy for out-of-domain texts with a self-training approach that uses confidence-based methods to select additional training samples. We compare two confidence-based methods: The first method uses the parse score of the employed parser to measure the confidence into a ...

متن کامل

EXCOM: An Automatic Annotation Engine for Semantic Information

2006

Brahim Djioua Jorge J. García Flores Antoine Blais Jean-Pierre Desclés Gaëll Guibert Agata Jackiewicz Florence Le Priol Leila Nait-Baha Benoît Sauzay

In this position paper we describe the actual state of the development of an integrated set of tools (called EXCOM) for automatic semantic annotation. Annotation is generally used as an operation for marking textual segments to express some morphological and syntactic information. Establishing the semantic web on a large scale implies the widespread annotation of web documents with ontologybase...

متن کامل

Texts, Texts, Texts and More Texts (With Apologies To Gabby)

Journal: :Anthurium A Caribbean Studies Journal 2013

متن کامل

Una aproximación al uso de word embeddings en una tarea de similitud de textos en español

Journal: :Procesamiento del Lenguaje Natural 2016

Tomás López-Solaz José Antonio Troyano Jiménez F. Javier Ortega Fernando Enríquez de Salamanca Ros

In this paper we show how a vector representation of words based on word embeddings can help to improve the results in tasks focused on the semantic similarity of texts. Thus we have experimented with two methods that rely on the vector representation of words to calculate the degree of similarity of two texts, one based on the aggregation of vectors and the other one based on the calculation o...

متن کامل

Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality

2015

Veronika Laippala Jenna Kanerva Anna Missilä Sampo Pyysalo Tapio Salakoski Filip Ginter

This paper presents the first results on detecting informality, machine and human translations in the Finnish Internet Parsebank, a project developing a large-scale, web-based corpus with full morphological and syntactic analyses. The paper aims at classifying the Parsebank according to these criteria, as well as studying the linguistic characteristics of the classes. The features used include ...

متن کامل

Semantic structuring of conference contributions using the Hofmethode

2010

Oliver Michel Damian Läge

The similarity relation of a number of texts is important not only for congress organizers (who need to group the proposed contributions to meaningful sessions) but to everybody who wants to find certain information within a larger number of texts. Existing information retrieval methods compare texts according to their similarity. Because these methods mostly remain on the surface of the words,...

متن کامل

Embedded Optical Character Recognition On Tamil Text Image Using Raspberry Pi

2014

V. Ajantha Devi Santhosh Baboo

Optical Character recognition is used to digitize and reproduce texts that have been produced with non-computerized system. Digitizing texts also helps reduce storage space. Editing and Reprinting of Text document that were printed on paper are time consuming and labour intensive. Optical Character recognition is also useful for visually impaired people who cannot read Text document, but need t...

متن کامل

Language Identification Based on High Frequency Approaches

2014

Kheireddine Abainia Siham Ouamour Halim Sayoud

This paper deals with the problem of automatic language identification of noisy texts, which represents an important task in natural language processing. Actually, there exist several works in this field, which are based on statistical and machine learning approaches for different categories of texts. Unfortunately, most of the proposed methods work fine on clean texts or long texts, but often ...

متن کامل