نتایج جستجو برای: n grams

تعداد نتایج: 982486  

2003
Paul McNamee James Mayfield

In the past we have conducted experiments that investigate the benefits and peculiarities attendant to alternative methods for tokenization, particularly overlapping character n-grams. This year we continued this line of work and report new findings reaffirming that the judicious use of n-grams can lead to performance surpassing that of word-based tokenization. In particular we examined: the re...

2014
Mike Kestemont

This position paper focuses on the use of function words in computational authorship attribution. Although recently there have been multiple successful applications of authorship attribution, the field is not particularly good at the explication of methods and theoretical issues, which might eventually compromise the acceptance of new research results in the traditional humanities community. I ...

2013
Petra Galuscáková Pavel Pecina

We describe our experiments for the Similar Segments in Social Speech Task at MediaEval 2013 Benchmark. We mainly focus on segmentation of the recordings into shorter passages on which we apply standard retrieval techniques. We experiment with machine-learning-based segmentation employing textual (word n-grams, tag n-grams, letter cases, lexical cohesion, etc.) and prosodic features (silence) a...

2008
Stephan Raaijmakers Wessel Kraaij

We present a shallow linguistic approach to subjectivity classification. Using multinomial kernel machines, we demonstrate that a data representation based on counting character n-grams is able to improve on results previously attained on the MPQA corpus using word-based n-grams and syntactic information. We compare two types of string-based representations: key substring groups and character n...

2007
Georgia Frantzeskou Efstathios Stamatatos Stefanos Gritzalis

Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code authorship disputes, intellectual property infringement, tracing the source of code left in the system after a cyber attack,...

2011
Tino Hartmann Sebastian Klenk Andre Burkovski Gunther Heidemann

Automatic detection of the sentiment of a given text is a difficult but highly relevant task. Application areas range from financial news, where information about sentiments can be used to predict stock movements, to social media, where user recommendations can determine success or failure of a product. We have developed a methodology, based on character ngrams, to detect sentiments encoded in ...

Journal: :International Journal of Medical Informatics 2009

Journal: :Open Computer Science 2021

Abstract Despite the modern boom in technology, we are still faced with fact that people write texts without diacritics. There two main reasons for this. The first, historical reason stems from past when use of diacritics was troublesome and would text them. second one is speed - typing usually faster. Text easy to understand people, but some types documents, missing can cause a problem. This a...

2016
Deepak Agnihotri Kesari Verma Priyanka Tripathi

The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید