نتایج جستجو برای: n grams

تعداد نتایج: 982486  

2006
Klaus Frieler

In this paper we propose three generalizations of well-known N-gram approaches for measuring similarity of single-line melodies. In a former paper we compared around 50 similarity measures for melodies with empirical data from music psychological experiments. Similarity measures based on edit distances and N-grams always showed the best results for different contexts. This paper aims at a gener...

2014
Magdalena Jankowska Vlado Keselj Evangelos E. Milios

We use ensembles of proximity based one-class classifiers for authorship verification task. The one-class classifiers compare, for each document of the known authorship, the dissimilarity between this document and the most dissimilar other document of this authorship to the dissimilarity between this document and the questioned document. As the dissimilarity measure between documents we use Com...

2013
Mikhail Aleksandrovich Dykov Pavel Nikolaevich Vorobkalov

The problem considered in this paper relates to identification of trends in a given area based on analysis of Twitter messages. The approaches currently used for Twitter trends detection are based on n-grams. We propose another approach of trend detection based on identifying trend as grammatical relation and perform the identification of trending relations on the basis of their frequency chang...

2014
Liviu P. Dinu Alina Maria Ciobanu Ioana Chitoran Vlad Niculae

We address the task of stress prediction as a sequence tagging problem. We present sequential models with averaged perceptron training for learning primary stress in Romanian words. We use character n-grams and syllable n-grams as features and we account for the consonant-vowel structure of the words. We show in this paper that Romanian stress is predictable, though not deterministic, by using ...

Journal: :CoRR 2014
Aditya Joshi Johan T. Halseth Pentti Kanerva

Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter n-grams into high-dimensional Language Vectors. Further, we show that the method is easil...

2015
Daniel Preotiuc-Pietro Johannes C. Eichstaedt Gregory J. Park Maarten Sap Laura Smith Victoria Tobolsky H. Andrew Schwartz Lyle H. Ungar

Mental illnesses, such as depression and post traumatic stress disorder (PTSD), are highly underdiagnosed globally. Populations sharing similar demographics and personality traits are known to be more at risk than others. In this study, we characterise the language use of users disclosing their mental illness on Twitter. Language-derived personality and demographic estimates show surprisingly s...

2014
Jenna Kanerva Juhani Luotolahti Veronika Laippala Filip Ginter

In this paper, we report on the development of a large-scale Finnish Internet parsebank, currently consisting of 1.5 billion tokens in 116 million sentences. The data is fully morphologically and syntactically analyzed and it has been used to extract flat and syntactic n-gram collections, as well as verb-argument and nounargument n-grams. Additionally, distributional vector space representation...

2017
Rodrigo Ribeiro Oliveira Rosalvo Ferreira Oliveira Neto

Author profiling is the problem of determining the characteristics of an author of an anonymous text. In this paper, we detail a method to determine the language variety and the gender of the authors of tweets, as a submission for the Author Profiling Task at PAN 2017. This method seeks to select the most significant character n-grams for each class considered, combining them with style feature...

Journal: :Computer Speech & Language 2012
Koen Deschacht Jan De Belder Marie-Francine Moens

Statistical language models have found many applications in information retrieval since their introduction almost three decades ago. Currently the most popular models are n-gram models, which are known to suffer from serious sparseness issues, which is a result of the large vocabulary size |V | of any given corpus and of the exponential nature of n-grams, where potentially |V | n-grams can occu...

Journal: :Natural Language Engineering 2006
Diane J. Litman Katherine Forbes-Riley

We examine correlations between dialogue behaviors and learning in tutoring, using two corpora of spoken tutoring dialogues: a human-human corpus and a human-computer corpus. To formalize the notion of dialogue behavior, we manually annotate our data using a tagset of student and tutor dialogue acts relative to the tutoring domain. A unigram analysis of our annotated data shows that student lea...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید