Precision and Recall Without Ground Truth

نویسندگان

  • Bart Lamiroy
  • Tao Sun
چکیده

In this paper we present a way to use precision and recall measures in total absence of ground truth. 1 Precision and Recall 1.1 General Definitions and Notation Precision Pr and Recall Rc (and often associated F-measure or ROC curves) are standard metrics expressing the quality of Information Retrieval methods [8]. They are usually expressed with respect to a query q (or averaged over a series of queries) over a data set ∆ such that: Pr q = ∣∣P∆ q ∩Rq ∣∣ ∣∣R∆q ∣∣ (1) Rcq = ∣∣P∆ q ∩Rq ∣∣ ∣∣P∆ q ∣∣ (2) where P q is the set of all documents in ∆, relevant to query q, and where Rq is the set of documents actually retrieved by q. Although we can make a safe assumption by considering Rq known (i.e the query q can actually be executed, and returns a known, manageable set of results), the same assumption does not always hold for P q , as will be shown later. For ease of reading we will refer to respectively Pr, P Rc, and R, when there is no ambiguity on ∆ and q. Often both are combined in the Fβ measure, where Fβ = ( 1 + β ) PrRc β2Pr +Rc (3) ∗Bart Lamiroy was a visiting scientist at Lehigh University in 2010-2011. This work was conducted at the Computer Science and Engineering Departent at Lehigh University and was supported in part by a DARPA IPTO grant administered by Raytheon BBN Technologies. 1 in ria -0 06 17 31 4, v er si on 1 26 A ug 2 01 1 Author manuscript, published in "Ninth IAPR International Workshop on Graphics RECognition GREC 2011 (2011)"

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measures for ranking cell trackers without manual validation

Cell tracking is often implemented as cell detection and data association steps. For a particular detection output it is a challenge to automatically select the best association algorithm. We approach this challenge by developing novel measures for ranking the association algorithms according to their performance without the need for a ground truth. We formulate tracking as a binary classi cati...

متن کامل

Computing Precision and Recall with Missing or Uncertain Ground Truth

In this paper we present a way to use precision and recall measures in total absence of ground truth. We develop a probabilistic interpretation of both measures and show that, provided a sufficient number of data sources are available, it offers a viable performance measure to compare methods if no ground truth is available. This paper also shows the limitations of the approach, in case a syste...

متن کامل

Aide à la gestion des processus de numérisation en vue de l'OCRisation des ouvrages

In this paper, we investigate how to improve the digitization process at the French national Library. We propose in the first part a study on the relationship between the bibliographic data of the document and the selection decisions of the documents to help in this task. In the second part, we present an existing approach to estimate precision and recall without ground truth that could be used...

متن کامل

Ground Truth Energies for Hierarchies of Segmentations

In evaluating a hierarchy of segmentations H of an image by ground truth G, which can be partitions of the space or sets, we look for the optimal partition in H that ”fits” G best. Two energies on partial partitions express the proximity from H to G, and G to H. They derive from a local version of the Hausdorff distance. Then the problem amounts to finding the cut of the hierarchy which minimiz...

متن کامل

Master Thesis Data - Driven De - Anonymization in Bitcoin

We analyse the performance of several clustering algorithms in the digital peerto-peer currency Bitcoin. Clustering in Bitcoin refers to the task of finding addresses that belongs to the same wallet as a given address. In order to assess the effectiveness of clustering strategies we exploit a vulnerability in the implementation of Connection Bloom Filtering to capture ground truth data about 37...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011