Reliable Classifications with Machine Learning
نویسندگان
چکیده
In the past decades Machine Learning algorithms have been successfully used in several classification problems. While they often significantly outperform domain experts (in terms of classification accuracy or otherwise), they are mostly not being used in practice. A plausible reason for this is that it is difficult to obtain an unbiased estimation of a single classification’s reliability. While most Machine Learning algorithms can provide a quantitative assessment of single classifications (e.g. rule coverage, leaf class distribution), these may be heavily biased and therefore unreliable due to sparse training data and model limitations). In this paper we propose a general transductive method for estimation of classification’s reliability that is independent of applied Machine Learning algorithm. We compare our method with existing approaches and discuss its advantages. We perform extensive testing on 14 domains and 6 Machine Learning algorithms and show that our approach can frequently yield more than 100% improvement in reliability estimation performance.
منابع مشابه
Comparison of classic regression methods with neural network and support vector machine in classifying groundwater resources
In the present era, classification of data is one of the most important issues in various sciences in order to detect and predict events. In statistics, the traditional view of these classifications will be based on classic methods and statistical models such as logistic regression. In the present era, known as the era of explosion of information, in most cases, we are faced with data that c...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملEstimating Confidence Values of Individual Predictions by their Typicalness and Reliability
Although machine learning algorithms have been successfully used in many problems, and are emerging as valuable data analysis tools, their serious practical use is affected by the fact that often they cannot produce reliable and unbiased assessments of their predictions’ quality. There exist several approaches for estimating reliability or confidence for individual classifications, and many of ...
متن کاملLearning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2
In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, ...
متن کاملAutomatic Lexical Classification -- Balancing between Machine Learning and Linguistics
Verb classifications have been used to support a number of practical tasks and applications, such as parsing, information extraction, question-answering, and machine translation. However, large-scale exploitation of verb classes in real-world or domain-sensitive tasks has not been possible because existing manually built classifications are incomprehensive. This paper describes recent and on-go...
متن کامل