MLgsc: A Maximum-Likelihood General Sequence Classifier

نویسندگان

  • Thomas Junier
  • Vincent Hervé
  • Tina Wunderlin
  • Pilar Junier
  • I. King Jordan
چکیده

We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task

In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...

متن کامل

A general maximum likelihood framework for modulation classification

This paper deals with modulation classification, First, a state of the art is given which is separated into two classes: the pattern recognition approach and the Maximum Likelihood (ML) approach. Then we propose a new classifier called the General Maximum Likelihood Classilier (GMLC) based on an approximation of the likelihood function. We derive equations of this classifier in the case of line...

متن کامل

مقایسه روش‌های طبقه‌بندی‌کننده حداکثر مشابهت و حداقل فاصله از میانگین در تهیه نقشه پوشش اراضی (مطالعه موردی: استان اصفهان)

Land cover maps derived from satellite images play a key role in regional and national land cover assessments. In order to compare maximum likelihood and minimum distance to mean classifiers, LISS-III images from IRS-P6 satellite were acquired in August 2008 from the western part of Isfahan. First, the LISS-III image was georeferenced. The Root Mean Square error of less than one pixel was the r...

متن کامل

Improving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach

A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015