نتایج جستجو برای: n grams

تعداد نتایج: 982486  

2015
Juan Pablo Posadas-Durán Helena Gómez-Adorno Ilia Markov Grigori Sidorov Ildar Z. Batyrshin Alexander F. Gelbukh Obdulia Pichardo-Lagunas

This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...

2002
Gary E. Kopec Maya R. Said Kris Popat

This paper explores the problem of incorporating linguistic constraints into document image decoding, a communication theory approach to document recognition. Probabilistic character n-grams (n=2–5) are used in a two-pass strategy where the decoder first uses a very weak language model to generate a lattice of candidate output strings. These are then re-scored in the second pass using the full ...

2001
Tin Kam Ho George Nagy

We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-g...

2010
Daniel Micol Óscar Ferrández Fernando Llopis Rafael Muñoz

In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of documents. The system that we have developed contains a first phase of document selection that uses a variant of tf -idf applied over the terms that appear in the two documents of the pair being compared. After this is don...

2017
Ilia Markov Helena Gómez-Adorno Grigori Sidorov Alexander F. Gelbukh

We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n...

2015
Magnus Breder Birkenes Lars G. Johnsen Arne Martinus Lindstad Johanne Ostad

At the National Library of Norway, we are currently developing a service comparable to the Google Ngram Viewer (Michel et al., 2010; Lin et al., 2012; Aiden and Michel, 2013) called NB Ngram. It is based on all books and newspapers digitized up to and including 2013, as part of the large scale digitization project at the National Library of Norway. Uni-, biand trigams have been generated on the...

2007
Niels Landwehr Luc De Raedt

We introduce relational grams (r-grams). They upgrade n-grams for modeling relational sequences of atoms. As n-grams, r-grams are based on smoothed n-th order Markov chains. Smoothed distributions can be obtained by decreasing the order of the Markov chain as well as by relational generalization of the r-gram. To avoid sampling object identifiers in sequences, r-grams are generative models at t...

Journal: :Simetris: Jurnal Teknik Mesin, Elektro dan Ilmu Komputer 2018

Journal: :PeerJ 2021

Research of the techniques for effective fake news detection has become very needed and attractive. These have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams POS tagging had been proven insufficient classification. However, they did not realise any empirical results, which could confirm these statements...

Journal: :The International journal of Multimedia & Its Applications 2014

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید