نتایج جستجو برای: similarity score

تعداد نتایج: 325828  

Journal: :JCP 2011
Chenghui Huang Jian Yin Fang Hou

In text mining area, popular methods use the bagof-words models, which represent a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper proposes a new similarity measure based on suffix tree model of text documents. It analyzes the word sequence information, and then computes the similarity between ...

2016
Makoto Urakawa Masaru Miyazaki Hiroshi Fujisawa Masahide Naemura Ichiro Yamada

Curriculum for school is generated based on the academic year. Because students have to study several subjects each and every year, the relative topics are put into curricula in discrete. In this study, we propose a method to construct a dynamic learning path which enables us to learn the relative topics continuously. In this process, we define two kinds of similarity score, inheritance score a...

2017
Giannis Nikolentzos Polykarpos Meladianos François Rousseau Yannis Stavrakas Michalis Vazirgiannis

In this paper, we present a novel document similarity measure based on the definition of a graph kernel between pairs of documents. The proposed measure takes into account both the terms contained in the documents and the relationships between them. By representing each document as a graph-of-words, we are able to model these relationships and then determine how similar two documents are by usi...

2003
Aurora Pons-Porrata Rafael Berlanga Llavori José Ruiz-Shulcloper

In this paper we propose an incremental hierarchical clustering algorithm for on-line event detection. This algorithm is applied to a set of newspaper articles in order to discover the structure of topics and events that they describe. In the first level, articles with a high temporal-semantic similarity are clustered together into events. In the next levels of the hierarchy, these events are s...

2004
Kiyonori Ohtake Youichi Sekiguchi Kazuhide Yamamoto

We propose a detection method for orthographic variants caused by transliteration in a large corpus. The method employs two similarities. One is string similarity based on edit distance. The other is contextual similarity by a vector space model. Experimental results show that the method performed a 0.889 F-measure in an open test.

2013
Wei He Shuang Li Xiaoping Yang

Ontology is applied to various fields of computer as a conceptual modeling tool, and is used to organize information and manage knowledge. Ontology extension is used to add the new concepts and relationship into the existing ontology, which is a more complex task. In this paper, we propose a hybrid approach for ontology extension from text using semantic relatedness between words, which exploit...

2017
Mehreen Gillani Muhammad U. Ilyas Saad Saleh Jalal S. Alowibdi Naif R. Aljohani Fahad S. Alotaibi

Every day 645 million Twitter users generate approximately 58 million tweets. This motivates the question if it is possible to generate a summary of events from this rich set of tweets only. Key challenges in post summarization from microblog posts include circumnavigating spam and conversational posts. In this study, we present a novel technique called lexi-temporal clustering (LTC), which ide...

2016
Aditya Joshi Vaibhav Tripathi Kevin Patel Pushpak Bhattacharyya Mark James Carman

This paper makes a simple increment to state-ofthe-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also e...

Journal: :CoRR 2017
Luciano Barbosa Paulo Rodrigo Cavalin Victor Guimaraes Matthias Kormaksson

In this paper, we present the methodology and the results obtained by our teams, dubbed Blue Man Group, in the ASSIN (from the Portuguese Avaliação de Similaridade Semântica e Inferência Textual) competition, held at PROPOR 2016. Our team’s strategy consisted of evaluating methods based on semantic word vectors, following two distinct directions: 1) to make use of low-dimensional, compact, feat...

Journal: :CoRR 2017
Esraa Ali Annalina Caputo Séamus Lawless

In this paper we describe our solution to the WSDM Cup 2017 Triple Scoring task. Our approach generates a relevance score based on the textual description of the triple’s subject and value (Object). It measures how similar (related) the text description of the subject is to the text description of its values. The generated similarity score can then be used to rank the multiple values associated...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید