text document classification

نتایج جستجو برای: text document classification

تعداد نتایج: 765658 فیلتر نتایج به سال:

Document-Base Extraction for Single-Label Text Classification

2008

Yanbo J. Wang Robert Sanderson Frans Coenen Paul H. Leng

Many text mining applications, especially when investigating Text Classification (TC), require experiments to be performed using common textcollections, such that results can be compared with alternative approaches. With regard to single-label TC, most text-collections (textual data-sources) in their original form have at least one of the following limitations: the overall volume of textual dat...

متن کامل

The Yale cTAKES extensions for document classification: architecture and application

Journal: :Journal of the American Medical Informatics Association : JAMIA 2011

Vijay Garla Vincent Lo Re Zachariah Dorey-Stein Farah Kidwai Matthew Scotch Julie A. Womack Amy Justice Cynthia Brandt

BACKGROUND Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. METHODS The authors dev...

متن کامل

Text Structure - Aware Classification

2009

Zoran Dzunic Amir Globerson Martin Rinard Yoong Keok Lee Benjamin Snyder Tahira Naseem Christina Sauper Erdong Chen Harr Chen Jacob Eisenstein Pawan Deshpande Igor Malioutov Viktor Kuncak Karen Zee Michael Carbin Patrick Lam Darko Marinov

Bag-of-words representations are used in many NLP applications, such as text classification and sentiment analysis. These representations ignore relations across different sentences in a text and disregard the underlying structure of documents. In this work, we present a method for text classification that takes into account document structure and only considers segments that contain informatio...

متن کامل

Syntax and Semantics based Efficient Text Classification Framework

2017

This system proposes an efficient text classification approach which is based on multi – layer SVM-NN text classification and two-level representation model. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive. This system proposes two-level representation model to represent text data, one is ...

متن کامل

Identifying Multiple Topics in Texts

Journal: :Int. J. Comput. Linguistics Appl. 2016

Mohamed Mouine Diana Inkpen Pierre-Olivier Charlebois Tri Ho

In this paper, we present an innovative method for multi-label text classification. Our method uses Lucene to index texts and then assigns one or more classes to a new text based on its similarity relative to an annotated corpus. For finer granularity, we split the text into phrases, and then we focus on the noun phrases. Instead of classifying the entire text, we classify each noun phrase. The...

متن کامل

Multiclass patent document classification

Journal: :Artificial Intelligence Research 2017

متن کامل

A Concept Lattice-Based Kernel for SVM Text Classification

2009

Claudio Carpineto Carla Michini Raffaele Nicolussi

Standard Support Vector Machines (SVM) text classification relies on bag-of-words kernel to express the similarity between documents. We show that a document lattice can be used to define a valid kernel function that takes into account the relations between different terms. Such a kernel is based on the notion of conceptual proximity between pairs of terms, as encoded in the document lattice. W...

متن کامل

Comparing Speech and Text Classification on ICNALE

2016

Sergiu Nisioi

In this paper we explore and compare a speech and text classification approach on a corpus of native and non-native English speakers. We experiment on a subset of the International Corpus Network of Asian Learners of English containing the recorded speeches and the equivalent text transcriptions. Our results suggest a high correlation between the spoken and written classification results, showi...

متن کامل

Improved Graph Based K-NN Text Classification

2013

Lakshmi Kumari

This paper presents an improved graph based k-nn algorithm for text classification. Most of the organization are facing problem of large amount of unorganized data. Most of the existing text classification techniques are based on vector space model which ignores the structural information of the document which is the word order or the co-occurrences of the terms or words. In this paper we have ...

متن کامل

Local Skew Angle Estimation from Background Space in Text Regions

1997

Apostolos Antonacopoulos

Almost all document analysis approaches need to perform a global analysis of the page orientation as a separate process at an early stage. It would be preferable to estimate the orientation locally after page segmentation and classification, when more knowledge about the different regions is available. In this paper, a novel local skew estimation method is presented that takes advantage of the ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید