OnlineCM: Real-time Consensus Classification with Missing Values
نویسندگان
چکیده
Combining predictions from multiple sources or models has been shown to be a useful technique in data mining. For example, in network anomaly detection, multiple detectors’ output have to be combined to obtain the diagnostic decisions. Unfortunately, as data are generated at an increasingly high speed, existing prediction aggregation methods are facing new challenges. First, the high velocity and hugh volume of the data render existing batch mode prediction aggregation algorithms infeasible. Second, due to the heterogeneity, predictions from multiple models or data sources might not be perfectly synchronized, leading to abundant missing values in the prediction stream. We propose OnlineCM, short for Online Consensus Maximization, to address the above challenges. OnlineCM keeps only a minimal yet sufficient footprint for both consensus prediction and missing value imputation over the prediction stream. In particular, we show that the correlations among base models or data sources are sufficient for effective consensus prediction, require small storage and can be updated in an online fashion. Further, we identify a reinforcing relationship between missing value imputation and the consensus predictions, leading to a novel consensus-based missing values imputation method, which in turn makes model correlation estimation more accurate. Experiments demonstrates that OnlineCM achieves aggregated predictions that has close performance to the batch mode consensus maximization algorithm, and outperforms baseline methods significantly in 4 large real world datasets.
منابع مشابه
A New Algorithm to Impute the Missing Values in the Multivariate Case
There are several methods to make inferences about the parameters of the sampling distribution when we encounter the missing values and the censored data. In this paper, through the order statistics and the projection theorem, a novel algorithm is proposed to impute the missing values in the multivariate case. Then, the performance of this method is investigated through the simulation studies. ...
متن کاملRecurrent Neural Networks for Multivariate Time Series with Missing Values
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missi...
متن کاملZheng Classification with Missing Feature Values Using Local-Validity Approach
Zheng classification is a very important step in the diagnosis of traditional Chinese medicine (TCM). In clinical practice of TCM, feature values are often missing and incomplete cases. The performance of Zheng classification is strictly related to rates of missing feature values. Based on the pattern of the missing feature values, a new approach named local-validity is proposed to classify zhe...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملMachine Learning Techniques for Solving Classification Problems with Missing Input Data
Missing input data is a common drawback in many real-life pattern classification scenarios. The ability of missing data handling has become a fundamental requirement for pattern classification because an inappropriate treatment may cause large errors or false results on classification. The absence of certain values for relevant data attributes can seriously affect the accuracy of classification...
متن کامل