n grams

Summarizing twitter posts regarding COVID-19 based on n-grams

Journal: :Indonesian Journal of Electrical Engineering and Computer Science 2023

The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of most active social that enable users to tweet in conversations they are concerned about. problem arises when...

متن کامل

Enterprise People and Skill Discovery Using Tolerant Retrieval and Visualization

2007

Jan Brunnert Omar Alonso Dirk Riehle

Understanding an enterprise’s workforce and skill-set can be seen as the key to understanding an organization’s capabilities. In today’s large organizations it has become increasingly difficult to find people that have specific skills or expertise or to explore and understand the overall picture of an organization’s portfolio of topic expertise. This article presents a case study of analyzing a...

متن کامل

Character usage in Chinese short message service (SMS): a real-world study in Mainland China

Journal: :IJMC 2013

Xi Chen Chenhui Guo Michael Chau Weihua Zhou

Short Message Service (SMS) is an important component of modern mobile services. Given unique characteristics of Chinese language, it is imperative to conduct study to understand characteristic of language usage patterns in Chinese SMS so that important facts like why and how people in China use SMS can be discovered. In this paper, we report an analysis of Chinese SMS logs from three different...

متن کامل

Expanding n-gram analytics in ELAN and a case study for sign synthesis

2014

Rosalee Wolfe John C. McDonald Larwan Berke Marie Stumbo

A new extension to ELAN offers expanded n-gram analysis tools including improved search capabilities and an extensive library of statistical measures of association for n-grams. This paper presents an overview of the new tools and a case study in American Sign Language synthesis that exploits these capabilities for computing more natural timing in generated sentences. The new extension provides...

متن کامل

CNG Method with Weighted Voting

2004

Vlado Kešelj

CNG Method for Authorship Attribution. The Common N-Grams (CNG) classification method for authorship attribution (AATT) was described in [2]. The method is based on extracting the most frequent byte n-grams of size n from the training data. The n-grams are sorted by their normalized frequency, and the first L most-frequent n-grams define an author profile. Given a test document, the test profil...

متن کامل

Source Code Authorship Attribution using n-grams

2007

Steven Burrows

Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection. Given several samples of work from several auth...

متن کامل

Experimenting N-Grams in Text Categorization

Journal: :Int. Arab J. Inf. Technol. 2007

Abdellatif Rahmoun Zakaria Elberrichi

This paper deals with automatic supervised classification of documents. The approach suggested is based on a vector representation of the documents centred not on the words but on the n-grams of characters for varying n. The effects of this method are examined in several experiments using the multivariate chi-square to reduce the dimensionality, the cosine and Kullback&Liebler distances, and tw...

متن کامل

Word n-grams for cluster keyboards

2003

Nils Klarlund Michael Riley

A cluster keyboard partitions the letters of the alphabet onto subset keys. On such keyboards most words are typed with no more key presses than on the standard keyboard, but a key sequence may stand for two or more words. In current practice, this ambiguity problem is addressed by hypothesizing words according to their unigram (occurrence) frequency. When the hypothesized word is not the inten...

متن کامل

Author Profiling for Arabic Tweets based on n-grams

2017

Ayoub Abbassi Seifeddine Mechti Lamia Hadrich Belguith Rim Faiz

This paper presents an approach for author profiling of an unknown users from their texts produced in social media. In particular, we address the identification of two profile dimensions: gender and language variety, of Arabic twitter users based on their tweets. Our approach focused on applying metaclassification technique on features extracted from tweets body. We explored two main sets of fe...

متن کامل

Experiments in Farsi Text Retrieval

2001

FARHAD OROUMCHIAN NAGHMEH KARIMI MINA ZOLFY

-A series of experiments is being conducted on the Farsi language in the domain of laws in the university of Tehran. One of the goals of these experiments is to establish the performance of different weighting schemes and retrieval models. For the lack of a Farsi stemmer and some characterisitics of the language, it was decided to experiment with N-grams. With un-stemmed words and 2-grams, 3-gr...

متن کامل