n grams

نتایج جستجو برای: n grams

تعداد نتایج: 982486 فیلتر نتایج به سال:

Syntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015

2015

Juan Pablo Posadas-Durán Helena Gómez-Adorno Ilia Markov Grigori Sidorov Ildar Z. Batyrshin Alexander F. Gelbukh Obdulia Pichardo-Lagunas

This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...

متن کامل

N-gram language models for document image decoding

2002

Gary E. Kopec Maya R. Said Kris Popat

This paper explores the problem of incorporating linguistic constraints into document image decoding, a communication theory approach to document recognition. Probabilistic character n-grams (n=2–5) are used in a two-pass strategy where the decoder first uses a very weak language model to generate a lattice of candidate output strings. These are then re-scored in the second pass using the full ...

متن کامل

Exploration of Contextual Constraints for Character Pre-Classification

2001

Tin Kam Ho George Nagy

We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-g...

متن کامل

A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

2010

Daniel Micol Óscar Ferrández Fernando Llopis Rafael Muñoz

In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of documents. The system that we have developed contains a first phase of document selection that uses a variant of tf -idf applied over the terms that appear in the two documents of the pair being compared. After this is don...

متن کامل

The Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017

2017

Ilia Markov Helena Gómez-Adorno Grigori Sidorov Alexander F. Gelbukh

We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n...

متن کامل

From digital library to n-grams: NB N-gram

2015

Magnus Breder Birkenes Lars G. Johnsen Arne Martinus Lindstad Johanne Ostad

At the National Library of Norway, we are currently developing a service comparable to the Google Ngram Viewer (Michel et al., 2010; Lin et al., 2012; Aiden and Michel, 2013) called NB Ngram. It is based on all books and newspapers digitized up to and including 2013, as part of the large scale digitization project at the National Library of Norway. Uni-, biand trigams have been generated on the...

متن کامل

r-grams: Relational Grams

2007

Niels Landwehr Luc De Raedt

We introduce relational grams (r-grams). They upgrade n-grams for modeling relational sequences of atoms. As n-grams, r-grams are based on smoothed n-th order Markov chains. Smoothed distributions can be obtained by decreasing the order of the Markov chain as well as by relational generalization of the r-gram. To avoid sampling object identifiers in sequences, r-grams are generative models at t...

متن کامل

DETEKSI PLAGIASI DOKUMEN SKRIPSI MAHASISWA MENGGUNAKAN METODE N-GRAMS DAN WINNOWING

Journal: :Simetris: Jurnal Teknik Mesin, Elektro dan Ilmu Komputer 2018

متن کامل

Using of n-grams from morphological tags for fake news classification

Journal: :PeerJ 2021

Research of the techniques for effective fake news detection has become very needed and attractive. These have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams POS tagging had been proven insufficient classification. However, they did not realise any empirical results, which could confirm these statements...

متن کامل

Visual Character N-Grams for Classification and Retrieval of Radiological Images

Journal: :The International journal of Multimedia & Its Applications 2014

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید