نتایج جستجو برای: n grams
تعداد نتایج: 982486 فیلتر نتایج به سال:
This paper describes our system, entitled Idiom Savant, for the 7th Task of the Semeval 2017 workshop, “Detection and interpretation of English Puns”. Our system consists of two probabilistic models for each type of puns using Google n-grams and Word2Vec. Our system achieved fscore of 0.84, 0.663, and 0.07 in homographic puns and 0.8439, 0.6631, and 0.0806 in heterographic puns in task 1, task ...
This short paper describes four submissions to the Symbolic Melodic Similarity task of the MIREX 2010 edition. All four submissions rely on a local-alignment approach between sequences of n-grams, and they differ mainly on the substitution score between two n-grams. This score is based on a geometric representation that shapes musical pieces as curves in the pitch-time plane. One of the systems...
We describe the method of Author Profiling task. The task deals with study of profile aspects like gender and language variety. We explore an approach of using high-order char n-grams as features and logistic regression as a classifier for all subtasks. This approach appears to be simple and effective for the task. We also investigated feature importances and low-dimensional embeddings of the d...
Statistical language models should improve as the size of the n-grams increases from 3 to 5 or higher. However, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. To avoid these problems, the reduced n-grams’ approach previously developed by O’Boyle (1993) can be applied. A reduced n-gram lang...
We compare the performance of character n-gram features (n = 3–8) and lexical features (unigrams and bigrams of words), as well as their combinations, on the tasks of authorship attribution, author profiling, and discriminating between similar languages. We developed a single multi-labeled corpus for the three aforementioned tasks, composed of news articles in different varieties of Spanish. We...
Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a...
In this paper, we discuss the results of the IUCL system in the NLI Shared Task 2017. For our system, we explore a variety of phonetic algorithms to generate features for Native Language Identification. These features are contrasted with one of the most successful type of features in NLI, character n-grams. We find that although phonetic features do not perform as well as character n-grams alon...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, first, large memories are accepted and afterwards, merging criteria are applied to reduce complexity and to ensure reliable estimations. The results show how the perplexity obtained with x-grams is smaller than that of n-grams. Furthermore, the complexity is smaller...
The n-gram analysis technique breaks up a text document into several n-character long unique grams, and produces a vector whose components are the counts of these grams. A typical corpus contains hundreds of thousands of such grams. Wavelet compression reduces the dimension of the n-gram vectors, and speeds up document query operations. Document vectors with their dimensions reduced to four com...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید