text complexity

Semantic Relatedness Using Salient Semantic Analysis

2011

Samer Hassan Rada Mihalcea

This paper introduces a novel method for measuring semantic relatedness using semantic profiles constructed from salient encyclopedic features. The model is built on the notion that the meaning of a word can be characterized by the salient concepts found in its immediate context. In addition to being computationally efficient, the new model has superior performance and remarkable consistency wh...

متن کامل

Pronominalization revisited

2000

Renate Henschel Hua Cheng Massimo Poesio

Pronominalization has been related to the idea of a local focus – a set of discourse entities in the speaker’s centre of attention, for example in Gundel et al. (1993)’s givenness hierarchy or in centering theory. Both accounts say that the determination of the focus depends on syntactic as well as pragmatic factors, but have not been able to pin those factors down. In this paper, we uncover th...

متن کامل

Towards Multi Label Text Classification through Label Propagation

2012

Shweta C. Dharmadhikari Maya Ingle Parag Kulkarni

Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text cla...

متن کامل

Experimental Assessment of a Threshold Selection Algorithm for Tuning Classifiers in the Field of Hierarchical Text Categorization

2010

Andrea Addis Giuliano Armano Eloisa Vargiu

Text Categorization is the task of assigning predefined categories to text documents. It can provide conceptual views of document collections and has many important applications in the real world. Nowadays, most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold. Without loss of generality, let us assume ...

متن کامل

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

Journal: International Journal of Foreign Language Teaching and Research 2018

Laya Heidari Darani, Milad Malverdi Varzaneh,

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

Multivariate Algorithmics for NP-Hard String Problems

Journal: :Bulletin of the EATCS 2014

Laurent Bulteau Falk Hüffner Christian Komusiewicz Rolf Niedermeier

String problems arise in various applications ranging from text mining to biological sequence analysis. Many string problems are NP-hard. This motivates the search for (fixed-parameter) tractable special cases of these problems. We survey parameterized and multivariate algorithmics results for NP-hard string problems and identify challenges for future research.

متن کامل

The linguistic role of hesitation disfluencies: evidence from Hebrew and Japanese

2013

Vered Silber-Varod Takehiko Maruyama

In this paper we examine a certain aspect of prosodysyntax interface, that of hesitation disfluencies (HD) that occur intra-phrases or intra-morphemes. Such cases were found in two spontaneous corpora of two syntactically distinct languages – Israeli Hebrew (IH) and Japanese. It was found that intra-phrasal hesitations in the two languages calls for different explanations, since in Japanese the...

متن کامل

Text Mining in Bioinformatics: Research and Application

Journal: :IJIRR 2013

Yanliang Qi

Biomedical literatures have been increased at the exponential rate. To find the useful and needed information from such a huge data set is a daunting task for users. Text mining is a powerful tool to solve this problem. In this paper, we surveyed on text mining in Bioinformatics with emphasis on applications of text mining for bioinformatics. In this paper, the main research directions of text ...

متن کامل

Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words

2006

Dmitry Davidov Ari Rappoport

We present a novel approach for discovering word categories, sets of words sharing a significant aspect of their meaning. We utilize meta-patterns of highfrequency words and content words in order to discover pattern candidates. Symmetric patterns are then identified using graph-based measures, and word categories are created based on graph clique sets. Our method is the first pattern-based met...

متن کامل

Identiication of Text on Colored Book and Journal Covers

1999

Karin Sobottka Horst Bunke Heino Kronenberg

In this paper an approach to automatic text location and identiication on colored book and journal covers is proposed. To reduce the amount of small variations in color, a clustering algorithm is applied in a preprocessing step. Two methods have been developed for extracting text hypotheses. One is based on a top-down analysis using successive splitting of image regions. The other is a bottom-u...

متن کامل