نتایج جستجو برای: n grams

تعداد نتایج: 982486  

Journal: :Indonesian Journal of Electrical Engineering and Computer Science 2023

The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of most active social that enable users to tweet in conversations they are concerned about. problem arises when...

2007
Jan Brunnert Omar Alonso Dirk Riehle

Understanding an enterprise’s workforce and skill-set can be seen as the key to understanding an organization’s capabilities. In today’s large organizations it has become increasingly difficult to find people that have specific skills or expertise or to explore and understand the overall picture of an organization’s portfolio of topic expertise. This article presents a case study of analyzing a...

Journal: :IJMC 2013
Xi Chen Chenhui Guo Michael Chau Weihua Zhou

Short Message Service (SMS) is an important component of modern mobile services. Given unique characteristics of Chinese language, it is imperative to conduct study to understand characteristic of language usage patterns in Chinese SMS so that important facts like why and how people in China use SMS can be discovered. In this paper, we report an analysis of Chinese SMS logs from three different...

2014
Rosalee Wolfe John C. McDonald Larwan Berke Marie Stumbo

A new extension to ELAN offers expanded n-gram analysis tools including improved search capabilities and an extensive library of statistical measures of association for n-grams. This paper presents an overview of the new tools and a case study in American Sign Language synthesis that exploits these capabilities for computing more natural timing in generated sentences. The new extension provides...

2004
Vlado Kešelj

CNG Method for Authorship Attribution. The Common N-Grams (CNG) classification method for authorship attribution (AATT) was described in [2]. The method is based on extracting the most frequent byte n-grams of size n from the training data. The n-grams are sorted by their normalized frequency, and the first L most-frequent n-grams define an author profile. Given a test document, the test profil...

2007
Steven Burrows

Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection. Given several samples of work from several auth...

Journal: :Int. Arab J. Inf. Technol. 2007
Abdellatif Rahmoun Zakaria Elberrichi

This paper deals with automatic supervised classification of documents. The approach suggested is based on a vector representation of the documents centred not on the words but on the n-grams of characters for varying n. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine and Kullback&Liebler distances, and tw...

2003
Nils Klarlund Michael Riley

A cluster keyboard partitions the letters of the alphabet onto subset keys. On such keyboards most words are typed with no more key presses than on the standard keyboard, but a key sequence may stand for two or more words. In current practice, this ambiguity problem is addressed by hypothesizing words according to their unigram (occurrence) frequency. When the hypothesized word is not the inten...

2017
Ayoub Abbassi Seifeddine Mechti Lamia Hadrich Belguith Rim Faiz

This paper presents an approach for author profiling of an unknown users from their texts produced in social media. In particular, we address the identification of two profile dimensions: gender and language variety, of Arabic twitter users based on their tweets. Our approach focused on applying metaclassification technique on features extracted from tweets body. We explored two main sets of fe...

2001
FARHAD OROUMCHIAN NAGHMEH KARIMI MINA ZOLFY

-A series of experiments is being conducted on the Farsi language in the domain of laws in the university of Tehran. One of the goals of these experiments is to establish the performance of different weighting schemes and retrieval models. For the lack of a Farsi stemmer and some characterisitics of the language, it was decided to experiment with N-grams. With un-stemmed words and 2-grams, 3-gr...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید