text document classification

نتایج جستجو برای: text document classification

تعداد نتایج: 765658 فیلتر نتایج به سال:

Text Classification and Layout Analysis of Paper Fragments∗

2011

Stefan Fiel Markus Diem Florian Kleber Angelika Garz Robert Sablatnig

Document image analysis such as text classification and layout analysis allow for the automated extraction of document properties. In general these methodologies are pre-processing steps for Optical Character Recognition (OCR) systems. In contrast, the proposed method aims at clustering document snippets so that an automated clustering of documents can be performed. First, localized words are c...

متن کامل

A Surface-Similarity Based Two-Step Classifier for RITE-VAL

2014

Shohei Hattori Satoshi Sato

This paper describes the system of the team SKL in the NTCIR-11 RITE-VAL workshop. The system consists of two modules: RTE module and text-search module. The RTE module, which is a modified version of our previous system for the binary classification in the RITE-2 workshop, takes two-step classification strategy. The first step classifies a given text pair into positive or negative entailment c...

متن کامل

Annotated suffix trees for text modelling and classification

2008

Rajesh Mysore Pampapathi

Suffix trees are compact and versatile data structures in which paths from the root to nodes represent substrings of the encoded text. By annotating such a tree with the frequencies of substrings, it is possible to construct a compact model of text that captures its sequential nature. This thesis investigates the use of such a model in the representation and classification of text. The basic ap...

متن کامل

Tag-Weighted Topic Model for Mining Semi-Structured Documents

2013

Shuangyin Li Jiefei Li Rong Pan

In the last decade, latent Dirichlet allocation (LDA) successfully discovers the statistical distribution of the topics over a unstructured text corpus. Meanwhile, more and more document data come up with rich human-provided tag information during the evolution of the Internet, which called semistructured data. The semi-structured data contain both unstructured data (e.g., plain text) and metad...

متن کامل

Classification of Scientific Publications using Swarm Intelligence

2013

Tariq Ali Sohail Asghar Naseer Ahmed Sajid

Document classification is an important task in data mining. Currently, identifying category (i.e., topic) of a scientific publication is a manual task. The Association for Computing Machinery Computing Classification System (ACM CCS) is most wildly used multi-level taxonomy for scientific document classification. Correct classification becomes difficult with an increase in number of levels as ...

متن کامل

Milan edict. Text of the document

Journal: :Religious Freedom 2013

متن کامل

Document Thumbnails with Variable Text Scaling

Journal: :Computer Graphics Forum 2012

متن کامل

Text Document Clustering Using Semantic Neighbors

Journal: :Journal of Software Engineering 2011

متن کامل

Text Passage Classification Using Supervised Learning

1999

Y. Bi S. McClean

In this paper, we describe a method for text passage classification or extraction by means of supervised machine learning and analytically identifying passages. The underlying characteristic of the method lies in the utilization of the resulting classification, which leads to the classification of the portion of a document in a high dimensional feature space into a low dimensional space which i...

متن کامل

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

2015

Tianze Shi Zhiyuan Liu Yang Liu Maosong Sun

A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید