test semi

A Co-Ranking Algorithm for Learning Listwise Ranking Functions from Unlabeled Data

Journal: :JCP 2011

Hai-jiang He

In this paper, we propose a co-ranking algorithm that trains listwise ranking functions using unlabeled data simultaneously with a small number of labeled data. The coranking algorithm is based on the co-training paradigm that is a very common scheme in the semi-supervised classification framework. First, we use two listwise ranking methods to construct base ranker and assistant ranker, respect...

متن کامل

Scaling Up Semi-supervised Learning: An Efficient and Effective LLGC Variant

2007

Bernhard Pfahringer Claire Leschi Peter Reutemann

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi-supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unla...

متن کامل

Compact Acoustic Models for Embedded Speech Recognition

Journal: :EURASIP J. Audio, Speech and Music Processing 2009

Christophe Lévy Georges Linarès Jean-François Bonastre

Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS and small a amount of training data. In order to fit the resource constraints of embedded applications, an approach based on a semi-continuous HMM system using stateindependent acoustic modelling is proposed. A transformation is ...

متن کامل

Improved Training for Self-Training

Journal: :CoRR 2017

Gal Hyams Daniel Greenfeld Dor Bank

It is well known that for some tasks, labeled data sets may be hard to gather. Self-training, or pseudo-labeling, tackles the problem of having insufficient training data. In the self-training scheme, the classifier is first trained on a limited, labeled dataset, and after that, it is trained on an additional, unlabeled dataset, using its own predictions as labels, provided those predictions ar...

متن کامل

Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

2011

Shasha Liao Ralph Grishman

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular character...

متن کامل

Semi-Supervised Spectral Mapping for Enhancing Separation between Classes

2009

Weiwei Du Kiichi Urahama

We present a spectral mapping technique for semisupervised pattern classification. Importance scores of features are firstly evaluated with a semi-supervised feature selection algorithm by Zhao et al. Training data are then embedded into a low-dimensional space with a spectral mapping derived from the selected and weighted feature vectors with which test data are classified by the nearest neigh...

متن کامل

Tri-Training for Authorship Attribution with Limited Training Data

2014

Tieyun Qian Bing Liu Li Chen Zhiyong Peng

Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, whic...

متن کامل

Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels

2009

Clay Woolam Mohammad M. Masud Latifur Khan

This paper outlines a data stream classification technique that addresses the problem of insufficient and biased labeled data. It is practical to assume that only a small fraction of instances in the stream are labeled. A more practical assumption would be that the labeled data may not be independently distributed among all training documents. How can we ensure that a good classification model ...

متن کامل

Combining Self-reducibility and Partial Information Algorithms

2005

André Hernich Arfst Nickelsen

A partial information algorithm for a language A computes, for some fixed m, for input words x1, . . . , xm a set of bitstrings containing χA(x1, . . . , xm). E.g., p-selective, approximable, and easily countable languages are defined by the existence of polynomial-time partial information algorithms of specific type. Self-reducible languages, for different types of self-reductions, form subcla...

متن کامل

CoConut: Co-Classification with Output Space Regularization

2014

Sameh Khamis Christoph H. Lampert

In this work we introduce a new approach to co-classification, i.e. the task of jointly classifying multiple, otherwise independent, data samples. The method we present, named CoConut, is based on the idea of adding a regularizer in the label space to encode certain priors on the resulting labelings. A regularizer that encourages labelings that are smooth across the test set, for instance, can ...

متن کامل