Measuring Performance when Positives Are Rare: Relative Advantage versus Predictive Accuracy - A Biological Case Study

نویسندگان

  • Stephen Muggleton
  • Christopher H. Bryant
  • Ashwin Srinivasan
چکیده

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accurate comprehensible predic tors of members of biological sequence families The positive only learn ing framework of the Inductive Logic Programming ILP system CPro gol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors NPPs Performance is mea sured using both predictive accuracy and a new cost function Relative Advantage RA The RA results show that searching for NPPs by using our best NPP predictor as a lter is more than times more e cient than randomly selecting proteins for synthesis and testing them for biological activity Predictive accuracy is not a good measure of per formance for this domain because it does not discriminate well between NPP recognition models despite covering varying numbers of the rare positives all the models are awarded a similar high score by predictive accuracy because they all exclude most of the abundant negatives

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Performance when Positives are Rare : Relative Advantage versus Predictive

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predic-tors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CPro-gol is used to generate a grammar for recognising a class o...

متن کامل

Measuring Performance when Positives are Rare

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomskylike grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of p...

متن کامل

Learning Chomsky-like Grammars for Biological Sequence Families

This paper presents a new method of measur ing performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accu rate comprehensible predictors of members of biological sequence families The positive only learning framework of the Inductive Logic Programming ILP system CProgol is used to generate a grammar for recognis ing a class of ...

متن کامل

Are Grammatical Representations Useful for Learning from Biological Sequence Data? - A Case Study

This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, ...

متن کامل

Credit Risk Predictive Ability of G-ZPP Model Versus V-ZPP Model

Credit risk management is becoming more and more important in recent years. When a company deals with a financial problem, it may not be able to fulfill its financial obligations, which can cause direct and indirect financial losses to shareholders, creditors, investors and other people in the community. Advanced credit risk models that are based on market value include improving credit quality...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000