text documents

Vectorization of Text Documents for Identifying Unifiable News Articles

Journal: :International Journal of Advanced Computer Science and Applications 2019

بررسی نقش انواع بافتار هم‌نویسه‌ها در تعیین شباهت بین مدارک

ژورنال: پژوهشنامه پردازش و مدیریت اطلاعات 2018

ستوده, هاجر, هوشیار, مژگان,

Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...

متن کامل

A New Document Embedding Method for News Classification

ژورنال: پردازش علائم و داده ها 2023

Homayounpour, Mohammad Mehdi, Rahimi, Zahra,

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Text Enhancement for Laser Copiers

1999

Hakan Ancin Anoop K. Bhattacharjya

The vast majority of copied documents generally consist of text, and the copy quality mostly depends on the text’s reproduction quality. A new technique to enhance dark text on light background of scanned mixed mode documents (containing text, graphics and photo) is presented to improve copy quality. This technique incorporates various image processing filters that enhance dark text without dis...

متن کامل

A hybrid learning algorithm for text classification

Journal: :CoRR 2004

S. M. Kamruzzaman Farhana Haider

Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification that requires fewer documents for training. Instead of using words, word relation i.e association rules from...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

Journal: Journal of Advances in Computer Research 2018

Farhad Soleimanian Gharehchopogh, Hiwa Majidpour,

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Support Vector Machines for Text Categorization

2003

Atreya Basu Carolyn R. Watters Michael A. Shepherd

Text categorization is the process of sorting text documents into one or more predefined categories or classes of similar documents. Differences in the results of such categorization arise from the feature set chosen to base the association of a given document with a given category. Advocates of text categorization recognize that the sorting of text documents into categories of like documents r...

متن کامل

Extraction of Core Contents from Web Pages

Journal: :CoRR 2014

Sandeep Sirsat

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text and the schema that represent the text. Also the amount of structure used to represent the text depends on the purpose and size of text document. No semanti...

متن کامل

Representation and Classification of Text Documents: A Brief Review

2010

B S Harish S Manjunath

Text classification is one of the important research issues in the field of text mining, where the documents are classified with supervised knowledge. In literature we can find many text representation schemes and classifiers/learning algorithms used to classify text documents to the predefined categories. In this paper, we present various text representation schemes and compare different class...

متن کامل

Punjabi Text Classification using Naïve Bayes , Centroid and Hybrid Approach

2012

Vishal Gupta

Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text documents. Because of dramatic increase in the amount of content available in digital form, text classification becomes an urgent need to manage the digital data efficiently and accurately. Till now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this paper, existi...

متن کامل