n grams

نتایج جستجو برای: n grams

تعداد نتایج: 982486 فیلتر نتایج به سال:

JHU/APL Experiments in Tokenization and Non-Word Translation

2003

Paul McNamee James Mayfield

In the past we have conducted experiments that investigate the benefits and peculiarities attendant to alternative methods for tokenization, particularly overlapping character n-grams. This year we continued this line of work and report new findings reaffirming that the judicious use of n-grams can lead to performance surpassing that of word-based tokenization. In particular we examined: the re...

متن کامل

Function Words in Authorship Attribution. From Black Magic to Theory?

2014

Mike Kestemont

This position paper focuses on the use of function words in computational authorship attribution. Although recently there have been multiple successful applications of authorship attribution, the field is not particularly good at the explication of methods and theoretical issues, which might eventually compromise the acceptance of new research results in the traditional humanities community. I ...

متن کامل

CUNI at MediaEval 2013 Similar Segments in Social Speech Task

2013

Petra Galuscáková Pavel Pecina

We describe our experiments for the Similar Segments in Social Speech Task at MediaEval 2013 Benchmark. We mainly focus on segmentation of the recordings into shorter passages on which we apply standard retrieval techniques. We experiment with machine-learning-based segmentation employing textual (word n-grams, tag n-grams, letter cases, lexical cohesion, etc.) and prosodic features (silence) a...

متن کامل

A Shallow Approach to Subjectivity Classification

2008

Stephan Raaijmakers Wessel Kraaij

We present a shallow linguistic approach to subjectivity classification. Using multinomial kernel machines, we demonstrate that a data representation based on counting character n-grams is able to improve on results previously attained on the MPQA corpus using word-based n-grams and syntactic information. We compare two types of string-based representations: key substring groups and character n...

متن کامل

Identifying Authorship by Byte-Level N-Grams:

2007

Georgia Frantzeskou Efstathios Stamatatos Stefanos Gritzalis

Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code authorship disputes, intellectual property infringement, tracing the source of code left in the system after a cyber attack,...

متن کامل

Sentiment Detection with Character n-Grams

2011

Tino Hartmann Sebastian Klenk Andre Burkovski Gunther Heidemann

Automatic detection of the sentiment of a given text is a difficult but highly relevant task. Application areas range from financial news, where information about sentiments can be used to predict stock movements, to social media, where user recommendations can determine success or failure of a product. We have developed a methodology, based on character ngrams, to detect sentiments encoded in ...

متن کامل

Classifying disease outbreak reports using n-grams and semantic features

Journal: :International Journal of Medical Informatics 2009

متن کامل

Diacritics restoration based on word n-grams for Slovak texts

Journal: :Open Computer Science 2021

Abstract Despite the modern boom in technology, we are still faced with fact that people write texts without diacritics. There two main reasons for this. The first, historical reason stems from past when use of diacritics was troublesome and would text them. second one is speed - typing usually faster. Text easy to understand people, but some types documents, missing can cause a problem. This a...

متن کامل

Learning Chinese Word Embeddings With Words and Subcharacter N-Grams

Journal: :IEEE Access 2019

متن کامل

Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents

2016

Deepak Agnihotri Kesari Verma Priyanka Tripathi

The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید