Language Identification using Classifier Ensembles

نویسندگان

  • Shervin Malmasi
  • Mark Dras
چکیده

In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classifiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outperform the other entries in the closed training track of the DSL 2015 shared task, achieving the best accuracy of 95.54%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifier ensembles for image identification using multi-objective Pareto features

In this paper we propose classifier ensembles that use multiple Pareto image features for invariant image identification. Different from traditional ensembles that focus on enhancing diversity by generating diverse base classifiers, the proposed method takes advantage of the diversity inherent in the Pareto features extracted using a multi-objective evolutionary Trace Transform algorithm. Two v...

متن کامل

LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles

We present the description of the LTG entry in the SemEval-2016 Complex Word Identification (CWI) task, which aimed to develop systems for identifying complex words in English sentences. Our entry focused on the use of contextual language model features and the application of ensemble classification methods. Both of our systems achieved good performance, ranking in 2nd and 3rd place overall in ...

متن کامل

OPT: Oslo-Potsdam-Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing

The OPT submission to the Shared Task of the 2016 Conference on Natural Language Learning (CoNLL) implements a ‘classic’ pipeline architecture, combining binary classification of (candidate) explicit connectives, heuristic rules for non-explicit discourse relations, ranking and ‘editing’ of syntactic constituents for argument identification, and an ensemble of classifiers to assign discourse se...

متن کامل

Multilingual native language identification

We present the first study of Native Language Identification (NLI) applied to text written in languages other than English, using data from six languages. NLI is the task of predicting an author’s first language (L1) using only their writings in a second language (L2), with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but the...

متن کامل

A genetic approach for training diverse classifier ensembles

Classification is an active topic of Machine Learning. The most recent achievements in this domain suggest using ensembles of learners instead of a single classifier to improve classification accuracy. Comparisons between Bagging and Boosting show that classifier ensembles perform better when their members exhibit diversity, that is commit different errors. This paper proposes a genetic algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015