نتایج جستجو برای: text clustering

تعداد نتایج: 264479  

2012
Shashank Paliwal Vikram Pudi

Measuring inter-document similarity is one of the most essential steps in text document clustering. Traditional methods rely on representing text documents using the simple Bag-of-Words (BOW) model. A document is an organized structure consisting of various text segments or passages. Such single term analysis of the text treats whole document as a single semantic unit and thus, ignores other se...

2007
Le Phong Bao Vuong Xiaoying Gao

This paper introduces an approach that achieves automated data extraction for semi-structured Web pages by using clustering to group text tokens and data tuples into clusters. This approach uses both HTML and text features of text tokens to detect the similarities between them. After clustering, similar text tokens are expected to be in the same text clusters and labeled with the same text clus...

2006
Illhoi Yoo Xiaohua Hu Il-Yeol Song

In this paper, we introduce a coherent biomedical literature clustering and summarization approach that employs a graphical representation method for text using a biomedical ontology. The key of the approach is to construct document cluster models as semantic chunks capturing the core semantic relationships in the ontology-enriched scale-free graphical representation of documents. These documen...

2004
Lev Reyzin Moses Charikar

Clustering text data online as it comes in is a difficult problem. It is both hard to capture a meaningful notion of linguistic similarity and to cluster large amounts of data in a single pass. This problem is especially challenging because most known algorithms that ensure tight clusterings are inefficient on large datasets. While significant work has been done on text clustering, it has not b...

Journal: :Neural networks : the official journal of the International Neural Network Society 2003
Louis Massey

There is a large and continually growing quantity of electronic text available, which contain essential human and organization knowledge. An important research endeavor is to study and develop better ways to access this knowledge. Text clustering is a popular approach to automatically organize textual document collections by topics to help users find the information they need. Adaptive Resonanc...

2010
P. Ponmuthuramalingam T. Devi

Frequent term based text clustering is a text clustering technique, which uses frequent term set and dramatically decreases the dimensionality of the document vector space, thus especially addressing: very high dimensionality of the data and very large size of the databases. Frequent Term based Clustering algorithm (FTC) has shown significant efficiency comparing to some well known text cluster...

Journal: :JSW 2013
Junmin Zhao Kai Zhang Jian Wan

Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, c...

2005
Shaoxu Song Chunping Li

Text documents have sparse data spaces and current existing methods of text clustering use symmetry proximity to measure the correlation of documents. In this paper, we propose a novel approach to strengthen the discriminative feature of document objects, which uses asymmetric proximity for text clustering. We present a measure of asymmetric proximity between documents and between clusters. TCU...

2010
E. V. Prasad

Text document plays an important role in providing better document retrieval, document browsing and text mining. Traditionally, clustering techniques do not consider the semantics relationships between words, such as synonymy and hypernymy. Existing clustering techniques are based on the syntactic structure of the document. To exploit semantic relationships, WordNet has been used to improve clu...

Journal: :JDCTA 2010
Hui Gao Jun Jiang Li She Yan Fu

Text clustering is one of the difficult and hot research fields in the text mining research. Combing Map Reduce framework and the neuron initialization method of VPSOM (vector pressing SelfOrganizing Model) algorithm, a new text clustering algorithm is presented. It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Red...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید