Reliable Classifications with Machine Learning

نویسندگان

  • Matjaz Kukar
  • Igor Kononenko
چکیده

In the past decades Machine Learning algorithms have been successfully used in several classification problems. While they often significantly outperform domain experts (in terms of classification accuracy or otherwise), they are mostly not being used in practice. A plausible reason for this is that it is difficult to obtain an unbiased estimation of a single classification’s reliability. While most Machine Learning algorithms can provide a quantitative assessment of single classifications (e.g. rule coverage, leaf class distribution), these may be heavily biased and therefore unreliable due to sparse training data and model limitations). In this paper we propose a general transductive method for estimation of classification’s reliability that is independent of applied Machine Learning algorithm. We compare our method with existing approaches and discuss its advantages. We perform extensive testing on 14 domains and 6 Machine Learning algorithms and show that our approach can frequently yield more than 100% improvement in reliability estimation performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of classic regression methods with neural network and support vector machine in classifying groundwater resources

In the present era, classification of data is one of the most important issues in various sciences in order to detect and predict events. In statistics, the traditional view of these classifications will be based on classic methods and statistical models such as logistic regression. In the present era, known as the era of explosion of information, in most cases, we are faced with data that c...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Estimating Confidence Values of Individual Predictions by their Typicalness and Reliability

Although machine learning algorithms have been successfully used in many problems, and are emerging as valuable data analysis tools, their serious practical use is affected by the fact that often they cannot produce reliable and unbiased assessments of their predictions’ quality. There exist several approaches for estimating reliability or confidence for individual classifications, and many of ...

متن کامل

Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2

In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, ...

متن کامل

Automatic Lexical Classification -- Balancing between Machine Learning and Linguistics

Verb classifications have been used to support a number of practical tasks and applications, such as parsing, information extraction, question-answering, and machine translation. However, large-scale exploitation of verb classes in real-world or domain-sensitive tasks has not been possible because existing manually built classifications are incomprehensive. This paper describes recent and on-go...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002