نتایج جستجو برای: text document classification
تعداد نتایج: 765658 فیلتر نتایج به سال:
Document image analysis such as text classification and layout analysis allow for the automated extraction of document properties. In general these methodologies are pre-processing steps for Optical Character Recognition (OCR) systems. In contrast, the proposed method aims at clustering document snippets so that an automated clustering of documents can be performed. First, localized words are c...
This paper describes the system of the team SKL in the NTCIR-11 RITE-VAL workshop. The system consists of two modules: RTE module and text-search module. The RTE module, which is a modified version of our previous system for the binary classification in the RITE-2 workshop, takes two-step classification strategy. The first step classifies a given text pair into positive or negative entailment c...
Suffix trees are compact and versatile data structures in which paths from the root to nodes represent substrings of the encoded text. By annotating such a tree with the frequencies of substrings, it is possible to construct a compact model of text that captures its sequential nature. This thesis investigates the use of such a model in the representation and classification of text. The basic ap...
In the last decade, latent Dirichlet allocation (LDA) successfully discovers the statistical distribution of the topics over a unstructured text corpus. Meanwhile, more and more document data come up with rich human-provided tag information during the evolution of the Internet, which called semistructured data. The semi-structured data contain both unstructured data (e.g., plain text) and metad...
Document classification is an important task in data mining. Currently, identifying category (i.e., topic) of a scientific publication is a manual task. The Association for Computing Machinery Computing Classification System (ACM CCS) is most wildly used multi-level taxonomy for scientific document classification. Correct classification becomes difficult with an increase in number of levels as ...
In this paper, we describe a method for text passage classification or extraction by means of supervised machine learning and analytically identifying passages. The underlying characteristic of the method lies in the utilization of the resulting classification, which leads to the classification of the portion of a document in a high dimensional feature space into a low dimensional space which i...
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید