نتایج جستجو برای: n grams

تعداد نتایج: 982486  

2007
Andrija Tomovic Predrag Janicic

Rapid classification of documents is of high-importance in many multilingual settings (such as international institutions or Internet search engines). This has been, for years, a well-known problem, addressed by different techniques, with excellent results. We address this problem by a simple n-grams based technique, a variation of techniques of this family. Our n-grams-based classification is ...

2012
Jelena Graovac

This paper presents the results of classifying Serbian text documents using the byte-level n-gram based frequency statistics technique, employing four different dissimilarity measures. Results show that the byte-level n-grams text categorization, although very simple and language independent, achieves very good accuracy.

2015
Elisavet Palogiannidi Elias Iosif Polychronis Koutsakis Alexandros Potamianos

We propose and evaluate the use of an affective-semantic model to expand the affective lexica of German, Greek, English, Spanish and Portuguese. Motivated by the assumption that semantic similarity implies affective similarity, we use word level semantic similarity scores as semantic features to estimate their corresponding affective scores. Various context-based semantic similarity metrics are...

2001
Paul McNamee

Our work with the Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system has made use of overlapping character n-grams in the indexing and retrieval of text. In previous experiments with Western European languages we have shown that longer length n-grams (e.g., n=6) are capable of providing an effective form of alinguistic term normalization. We have wanted to in...

Journal: :IJCLCLP 2003
Le Quan Ha Elvira I. Sicilia-Garcia Ji Ming Francis Jack Smith

It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or chara...

2007
Ani Nenkova Jason M. Brenier Anubha Kothari Sasha Calhoun Laura Whitton David Beaver Daniel Jurafsky

The immense prosodic variation of natural conversational speech makes it challenging to predict which words are prosodically prominent in this genre. In this paper, we examine a new feature, accent ratio, which captures how likely it is that a word will be realized as prominent or not. We compare this feature with traditional accent-prediction features (based on part of speech and N-grams) as w...

2017
Ergun Biçici

Referential translation machines achieve top performance in both bilingual and monolingual settings without accessing any task or domain specific information or resource. RTMs achieve the 3rd system results for German to English sentence-level prediction of translation quality and the 2nd system results according to root mean squared error. In addition to the new features about substring distan...

2004
David Holmes Samsum Kashfi Syed Uzair Aqeel

We address name search for transliterated Arabic given names. In previous work, we addressed similar problems with English and Arabic surnames. In each previous case, we used a variant of Soundex and n-grams to improve precision and recall of name matching compared against well known approaches such as the Russell Soundex algorithm. Unlike prior work, the proposed approach does not rely upon So...

2005
Manabu Sassano

We explore the use of a partially annotated corpus to build a dependency parser for Japanese. We examine two types of partially annotated corpora. It is found that a parser trained with a corpus that does not have any grammatical tags for words can demonstrate an accuracy of 87.38%, which is comparable to the current state-of-the-art accuracy on the Kyoto University Corpus. In contrast, a parse...

2013
Magdalena Jankowska Vlado Keselj Evangelos E. Milios

We describe our participation in the Author Profiling task of the PAN 2013 competition. The task objective is to determine the age and the gender of an author of a document. We applied the Common N-Gram (CNG) classifier (Kešelj et al., 2003) to this task. The CNG classifier uses a dissimilarity measure based on the differences in the frequencies of the character n-grams that are most common in ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید