نتایج جستجو برای: text document classification

تعداد نتایج: 765658  

2006
Antoine Doucet Miro Lehtonen

This paper addresses the problem of the unsupervised classification of text-centric XML documents. In the context of the INEX mining track 2006, we present methods to exploit the inherent structural information of XML documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use str...

2016
Hugo Jair Escalante Manuel Montes-y-Gómez Luis Villaseñor Pineda Marcelo Luis Errecalde

Text classification is a widely studied problem, and it can be considered solved for some domains and under certain circumstances. There are scenarios, however, that have received little or no attention at all, despite its relevance and applicability. One of such scenarios is early text classification, where one needs to know the category of a document by using partial information only. A docum...

2006
Fathi H. Saad Beatriz de la Iglesia Duncan G. Bell

Text classification in the medical domain is a real world problem with wide applicability. This paper investigates extensively the effect of text representation approaches on the performance of medical document classification. To accomplish this objective, we evaluated seven different approaches to represent real word medical documents. The text representation approaches investigated in this pa...

2011
S. L. Ting W. H. Ip Albert H.C. Tsang

Document classification is a growing interest in the research of text mining. Correctly identifying the documents into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the existing classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this...

2012
Xiaohui Tao Yuefeng Li Raymond Y. K. Lau Hua Wang

The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification...

Journal: :Pattern Recognition Letters 2013
Ming Sun Carey E. Priebe

Manifold matching works to identify embeddings of multiple disparate data spaces into the same low-dimensional space, where joint inference can be pursued. It is an enabling methodology for fusion and inference from multiple and massive disparate data sources. In this paper three methods of manifold matching are considered: PoM, which stands for Multidimensional Scaling (MDS) composed with Proc...

Journal: :Bioinformatics 2009
Dolf Trieschnigg Piotr Pezik Vivian Lee Franciska de Jong Wessel Kraaij Dietrich Rebholz-Schuhmann

MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small...

2006
Angelo Dalli

Temporal information is presently underutilised for document and text processing purposes. This work presents an unsupervised method of extracting periodicity information from text, enabling time series creation and filtering to be used in the creation of sophisticated language models that can discern between repetitive trends and non-repetitive writing pat-terns. The algorithm performs in O(n ...

2012
B S Harish S Manjunath

In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...

2018
Catherine Inibhunu

Feature Extraction is a mechanism used to extract key phrases from any given text documents. This extraction can be weighted, ranked or semantic based. Weighted and Ranking based feature extraction normally assigns scores to extracted words based on various heuristics. Highest scoring words are seen as important. Semantic based extractions normally try to understand word meanings, and words wit...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید