pre semiclosed set

Heterogeneous Measurements and Multiple Classifiers for Speech Recognition1

1998

Andrew K. Halberstadt James R. Glass

This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dimensional acoustic measurement spaces. The techniques used for achieving these two goals can be br...

متن کامل

Training Set Construction Methods

2013

Tomas Borovicka

In order to build a classification or regression model, learning algorithms use datasets to set up its parameters and estimate model performance. Training set construction is a part of data preparation. This important phase is often underestimated in data mining process. However, choose the appropriate preprocessing algorithms is often as important as choose the suitable learning algorithm. Goa...

متن کامل

SWASH: A Naive Bayes Classifier for Tweet Sentiment Identification

2015

Ruth Talbot Chloe Acheampong Richard Wicentowski

This paper describes a sentiment classification system designed for SemEval-2015, Task 10, Subtask B. The system employs a constrained, supervised text categorization approach. Firstly, since thorough preprocessing of tweet data was shown to be effective in previous SemEval sentiment classification tasks, various preprocessessing steps were introduced to enhance the quality of lexical informati...

متن کامل

Cost-Sensitive Feature Reduction Applied to a Hybrid Genetic Algorithm

1996

Nada Lavrac Dragan Gamberger Peter D. Turney

This study is concerned with whether it is possible to detect what information contained in the training data and background knowledge is relevant for solving the learning problem, and whether irrelevant information can be eliminated in preprocessing before starting the learning process. A case study of data preprocessing for a hybrid genetic algorithm shows that the elimination of irrelevant f...

متن کامل

Preprocessing by a Cost-sensitive Literal Reduction Algorithm: Reduce 1

1996

Dragan Gamberger

This study is concerned with whether it is possible to detect what information contained in the training data and background knowledge is relevant for solving the learning problem, and whether irrelevant information can be eliminated in preprocessing before starting the learning process. A case study of data preprocessing for a hybrid genetic algorithm shows that the elimination of irrelevant f...

متن کامل

Efficient Similarity Joinmethodusing Unsupervised Learning

2012

Bilal Hawashin Farshad Fotouhi William Grosky

This paper proposes an efficient similarity join method using unsupervised learning, when no labeled data is available. In our previous work, we showed that the performance of similarity join could improve when long string attributes, such as paper abstracts, movie summaries, product descriptions, and user feedback, are used under supervised learning, where a training set exists. In this work, ...

متن کامل

Achieving non-discrimination in prediction

Journal: :CoRR 2017

Lu Zhang Yongkai Wu Xintao Wu

Discrimination-aware classification is receiving an increasing attention in the data mining and machine learning fields. The data preprocessing methods for constructing a discrimination-free classifier remove discrimination from the training data, and learn the classifier from the cleaned data. However, there lacks of a theoretical guarantee for the performance of these methods. In this paper, ...

متن کامل

PAN 2017: Author Profiling - Gender and Language Variety Prediction

2017

Matej Martinc Iza Skrjanec Katja Zupan Senja Pollak

We present the results of gender and language variety identification performed on the tweet corpus prepared for the PAN 2017 Author profiling shared task. Our approach consists of tweet preprocessing, feature construction, feature weighting and classification model construction. We propose a Logistic regression classifier, where the main features are different types of character and word n-gram...

متن کامل

Cones and foci: A mechanical framework for protocol verification

Journal: :Formal Methods in System Design 2006

Wan Fokkink Jun Pang Jaco van de Pol

We define a cones and foci proof method, which rephrases the question whether two system specifications are branching bisimilar in terms of proof obligations on relations between data objects. Compared to the original cones and foci method from Groote and Springintveld, our method is more generally applicable, because it does not require a preprocessing step to eliminate τ -loops. We prove soun...

متن کامل

Application of the Intuitionistic Fuzzy InterCriteria Analysis Method to a Neural Network Preprocessing Procedure

2015

Sotir Sotirov Vassia Atanassova Evdokia Sotirova Veselina Bureva Deyan Mavrov

The artificial neural networks (ANN) are a tool that can be used for object recognition and identification. However, there are certain limits when we may use ANN, and the number of the neurons is one of the major parameters during the implementation of the ANN. On the other hand, the bigger number of neurons slows down the learning process. In our paper, we propose a method for removing the num...

متن کامل