text feature awareness

Vocabulary, phonological awareness and rapid naming: contributions for spelling and written production.

Journal: :Jornal da Sociedade Brasileira de Fonoaudiologia 2012

Maria Thereza Mazorra dos Santos Debora Maria Befi-Lopes

PURPOSE To investigate if the performance on linguistic tasks would be predictive of orthographic domain and quality of written productions. METHODS Participants were 82 fourth graders of Elementary Education, from public and private schools of São Paulo, with ages ranging from 9 years to 10 years and 2 months. The test battery was composed of an expressive vocabulary test, phonological aware...

متن کامل

Bursty Feature Representation for Clustering Text Streams

2007

Qi He Kuiyu Chang Ee-Peng Lim Jun Zhang

Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bur...

متن کامل

Contextual feature selection for text classification

Journal: :Inf. Process. Manage. 2007

François Paradis Jian-Yun Nie

We present a simple approach for the classification of ‘‘noisy’’ documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are c...

متن کامل

Hybrid Active Feature Selection For Text Classification

2016

Rashmi G. Dukhi Antara Bhattacharya

Clustering is the most common form of unsupervised learning.In clustering, it is the distribution and makeup of the data that will determine cluster membership. It needs representation of objects and similarity measure. which compares distribution of features between objects. For the high dimensionality, feature extraction and feature selection improves the performance of clustering algorithms....

متن کامل

Feature Selection in SVM Text Categorization

1999

Hirotoshi Taira Masahiko Haruno

This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization. The input space was gradually increased by using mutual information (MI) filtering and part-of-speech (POS) filtering, which determine the portion of words that are appropriate for learning from the information-theoretic and the linguistic perspectives, respectively. We tested the ...

متن کامل

Feature reduction techniques for Arabic text categorization

Journal: :JASIST 2009

Rehab Duwairi Mohammad Nayef Al-Refai Natheer Khasawneh

This paper presents and compares three feature reduction techniques that were applied to Arabic text. The techniques include stemming, light stemming, and word clusters. The effects of the aforementioned techniques were studied and analyzed on the K-nearest-neighbor classifier. Stemming reduces words to their stems. Light stemming,by comparison, removes commonaffixes from words without reducing...

متن کامل

Segmentation-based Feature Selection for Text Categorization

2006

Zsolt Minier Zalán Bodó Lehel Csató

Text categorization is an interesting problem in artificial intelligence that gets more and more attention from researchers and industry. One central problem of text categorization is the selection of a good feature set. We propose a novel method for term selection for each category based on segmenting the documents belonging to a category into cohesive sub-parts that define the subtopics of th...

متن کامل

Video text extraction using temporal feature vectors

2002

Xiaoou Tang Bo Luo Xinbo Gao Edwige Pissaloux HongJiang Zhang

A new caption text extraction algorithm that takes full advantage of the temporal information in a video sequence is developed. By detecting the (dis)appearance of caption text in a video stream, we first identify video segment that contains the same caption text. Then using the gray-level vector traced across the segment as the feature vector for a pixel point, we can clearly separate a captio...

متن کامل

Learning Feature-Value Grammars from Plain Text

1998

Tony C. Smith

This paper outlines preliminary work aimed at learning Feature-Value Grammars from plain text. Common suffixes are gleaned from a word suffix tree and used to form a first approximation of how regular inflection is marked. Words are generalised according to these suffixes and then subjected to trigram analysis in an attempt to identify agreement dependencies. They are subsequently labeled with ...

متن کامل

Feature-Based Visual Exploration of Text Classification

2015

Florian Stoffel Lucie Flekova Daniela Oelke Iryna Gurevych Daniel A. Keim

There are many applications of text classification such as gender attribution in market research or the identification of forged product reviews on e-commerce sites. Although several automatic methods provide satisfying performance in most application cases, we see a gap in supporting the analyst to understand the results and derive knowledge for future application scenarios. In this paper, we ...

متن کامل