نتایج جستجو برای: text documents classification

تعداد نتایج: 694633  

2012
R.

Recent years, feature selection is chief concern in text classification. A major characteristic in text classification is the high dimensionality of the feature space. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the ...

Journal: :Studies in health technology and informatics 2008
Stephan Spat Bruno Cadonna Ivo Rakovac Christian Gütl Hubert Leitner Günther Stark Peter Beck

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This...

Journal: :International Journal of Business Intelligence Research 2020

2015
Boris A. Galitsky Nina Lebedeva

The problem of classifying text with respect to metalanguage and language object patterns is formulated and its application areas are proposed. Examples of metalanguage patterns in text are foreign language grammar lessons and tutorials on how to write engineering documents. The method targets the text classification tasks where keyword statistics is insufficient do distinguish between such abs...

2007
E. Binaghi

In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural...

2015
Xingyuan Chen Yunqing Xia Peng Jin John A. Carroll

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of W...

2007
Stephan Spat STEPHAN SPAT

The Steiermärkische Krankenanstalten Ges.m.b.H. (KAGes) conducted the roll-out of an electronic patient record (EPR) system in 2004. This system contains an increasing amount of unstructured clinical text documents in German language. In order to facilitate the patient-related medical decision-making for physicians, this diploma thesis analyses and implements methods retrieving relevant medical...

2006
Zoltán Gyöngyi Hector Garcia-Molina Jan Pedersen

Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a l...

2007
Nikitas N. Karanikolas Christos Skourlas N. N. Karanikolas

The hard problem of the Text Classification usually has various aspects and potential solutions. In this paper, two main research directions for narrative documents’ classification are considered. The first one is based on data mining and rule induction techniques, while the second combines the traditional Text Retrieval techniques (use of the vector space model,

2003
Yefeng Zheng Huiping Li David S. Doermann

In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید