n grams

NLEL-MAAT at CLEF-ResPubliQA

2009

Santiago Correa Davide Buscaldi Paolo Rosso

This report presents the work carried out at NLE Lab for the QA@CLEF-2009 competition. We used the JIRS passage retrieval system, which is based on redundancy, with the assumption that it is possible to find the response to a question in a large enough document collection. The retrieved passages are ranked depending on the number, length and position of the question n-grams structures found in ...

متن کامل

Detecting Word Substitution in Adversarial Communication

2006

D. B. Skillicorn D. Roussinov W. P. Carey

When messages may be intercepted because they contain certain words, terrorists and criminals may replace such words by other words or locutions. If the replacement words have different frequencies from the original words, techniques to detect the substitution are known. In this paper, we consider ways to detect replacements that have similar frequencies to the original words. We consider the f...

متن کامل

Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality

2004

Bogdan Babych Anthony Hartley

Automatic methods for MT evaluation are often based on the assumption that MT quality is related to some kind of distance between the evaluated text and a professional human translation (e.g., an edit distance or the precision of matched N-grams). However, independently produced human translations are necessarily different, conveying the same content by dissimilar means. Such legitimate transla...

متن کامل

Convolutional Neural Networks for Authorship Attribution of Short Texts

2017

Thamar Solorio Paolo Rosso Manuel Montes-y-Gómez Prasha Shrestha Sebastián Sierra Fabio A. González

We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

متن کامل

Generating example contexts to help children learn word meaning

Journal: :Natural Language Engineering 2013

Liu Liu Jack Mostow Gregory Aist

This article addresses the problem of generating good example contexts to help children learn vocabulary. We describe VEGEMATIC, a system that constructs such contexts by concatenating overlapping five-grams from Google‘s N-gram corpus. We propose and operationalize a set of constraints to identify good contexts. VEGEMATIC uses these constraints to filter, cluster, score, and select example con...

متن کامل

A Keyphrase Extraction Approach for Social Tagging Systems

2012

Felice Ferrara Carlo Tasso

Social tagging systems allow people to classify resources by using a set of freely chosen terms named tags. However, by shifting the classification task from a set of experts to a larger and not trained set of people, the results of the classification are not accurate. The lack of control and guidelines generates noisy tags (i.e. tags without a clear semantics) which deteriorate the precision o...

متن کامل

The Users Who Say 'Ni': Audience Identification in Chinese-language Restaurant Reviews

2015

Rob Voigt Daniel Jurafsky

We give an algorithm for disambiguating generic versus referential uses of secondperson pronouns in restaurant reviews in Chinese. Reviews in this domain use the ‘you’ pronoun 你 either generically or to refer to shopkeepers, readers, or for selfreference in reported conversation. We first show that linguistic features of the local context (drawn from prior literature) help in disambigation. We ...

متن کامل

A Basic Character N-gram Approach to Authorship Verification Notebook for PAN at CLEF 2013

2013

Michiel van Dam

This paper describes our approach to the Author Identification task in the PAN 2013 evaluation lab. We use a profile-based approach and use the common n-grams (CNG) method that employs a normalized distance measure for short and unbalanced text introduced by Stamatatos[6]. We achieved the 9th place with an overall F1 score of 0.6.

متن کامل

Predicting therapist empathy in motivational interviews using language features inspired by psycholinguistic norms

2015

James Gibson Nikos Malandrakis Francisco Romero David C. Atkins Shrikanth S. Narayanan

Therapist language plays a critical role in influencing the overall quality of psychotherapy. Notably, it is a major contributor to the perceived level of empathy expressed by therapists, a primary measure for judging their efficacy. We explore psycholinguistics inspired features for predicting therapist empathy. These features model language which conveys information about affective and cognit...

متن کامل

Mining Progress Notes for Prediction of Activities of Daily Living

2013

Talha Oz Che Ngufor Janusz Wojtusiak

Replacement and removal filtering with help of subject-‐ maBer experts ConDguous word n-‐gram generaDon NormalizaDon of numeric values CalculaDon of the likelihood raDos of the n-‐grams that appear in the test case Deciding on the class based on the informaDve n-‐ grams

متن کامل