random forest bagging and machine learning

Boosting Dictionary Learning with Error Codes

Journal: :CoRR 2017

Yigit Oktar Mehmet Türkan

In conventional sparse representations based dictionary learning algorithms, initial dictionaries are generally assumed to be proper representatives of the system at hand. However, this may not be the case, especially in some systems restricted to random initializations. Therefore, a supposedly optimal state-update based on such an improper model might lead to undesired effects that will be con...

متن کامل

Fast and Flexible Monotonic Functions with Ensembles of Lattices

2016

Mahdi Milani Fard Kevin Robert Canini Andrew Cotter Jan Pfeifer Maya R. Gupta

For many machine learning problems, there are some inputs that are known to be positively (or negatively) related to the output, and in such cases training the model to respect that monotonic relationship can provide regularization, and makes the model more interpretable. However, flexible monotonic functions are computationally challenging to learn beyond a few features. We break through this ...

متن کامل

The aid of machine learning to overcome the classification of real health discharge reports written in Spanish Aportaciones de las técnicas de aprendizaje automático a la clasificación de partes de alta hospitalarios reales en castellano

2014

Alicia Pérez Arantza Casillas Koldo Gojenola Maite Oronoz Nerea Aguirre Estibaliz Amillano

Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Recor...

متن کامل

Fast Conditional Density Estimation for Quantitative Structure-Activity Relationships

2010

Fabian Buchwald Tobias Girschick Eibe Frank Stefan Kramer

Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive pred...

متن کامل

A First Machine Learning Approach to Pronominal Anaphora Resolution in Basque

2010

Olatz Arregi Uriarte Klara Ceberio Arantza Díaz de Ilarraza Iakes Goenaga Basilio Sierra Ana Zelaia Jauregi

In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI...

متن کامل

A Case Study of Random Forest in Predictive Data Mining

2009

Sebastian Schüller Stefan Lessmann Stefan Voß

The paper examines the potential of a novel data mining method, the random forest classifier, to support managerial decision making in complex forecasting applications. A modelling paradigm is proposed that embraces a learning curve analysis and grid-search to analyse the model’s sensitivity towards the number of training examples and parameter settings, respectively, and, eventually, produce a...

متن کامل

Modelling the Spatial Distribution of Culicoides imicola: Climatic versus Remote Sensing Data

Journal: :Remote Sensing 2014

Jasper Van doninck Bernard De Baets Jan Peters Guy Hendrickx Els I. Ducheyne Niko E. C. Verhoest

Culicoides imicola is the main vector of the bluetongue virus in theMediterranean Basin. Spatial distribution models for this species traditionally employ either climatic data or remotely sensed data, or a combination of both. Until now, however, no studies compared the accuracies of C. imicola distribution models based on climatic versus remote sensing data, even though remotely sensed dataset...

متن کامل

Unraveling the English-Bengali Code-Mixing Phenomenon

2016

Arunavha Chanda Dipankar Das Chandan Mazumdar

Code-mixing is a prevalent phenomenon in modern day communication. Though several systems enjoy success in identifying a single language, identifying languages of words in code-mixed texts is a herculean task, more so in a social media context. This paper explores the English-Bengali code-mixing phenomenon and presents algorithms capable of identifying the language of every word to a reasonable...

متن کامل

Semi-supervised Random Forest for Intrusion Detection Network

2017

Ningxin Shi Xiaohong Yuan William Nick

In order to protect valuable computer systems, network data needs to be analyzed and classified so that possible network intrusions can be detected. Machine learning techniques have been used to classify network data. For supervised machine learning methods, they can achieve high accuracy at classifying network data as normal or malicious, but they require the availability of fully labeled data...

متن کامل

Algorithmic Songwriting with ALYSIA

2017

Margareta Ackerman David Loker

This paper introduces ALYSIA: Automated LYrical SongwrIting Application. ALYSIA is based on a machine learning model using Random Forests, and we discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA was used to create original pop songs that were subsequently recorded and produced. Finally, we discuss our vision for the future of Automated Songwriting for both co-creativ...

متن کامل