Validating Distance-Based Record Linkage with Probabilistic Record Linkage

نویسندگان

  • Josep Domingo-Ferrer
  • Vicenç Torra
چکیده

This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the number of re-identified records. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

Distance-based and probabilistic record linkage for re-identification of records with categorical variables

Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The ...

متن کامل

Accuracy of probabilistic record linkage applied to health databases: systematic review.

OBJECTIVE To analyze both national and international literature on validity of record linkage procedure of health databases focusing on quality assessment of results. METHODS A systematic review of cohort, case-control, and cross-sectional studies that evaluated quality of probabilistic record linkage of health databases was conducted. Cochrane methodology of systematic reviews was used. The ...

متن کامل

Probabilistic record linkage

Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a 'black box' research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the c...

متن کامل

A Survey of Probabilistic Record Matching Models, Techniques and Tools

Probabilistic record linkage regards the use of stochastic decision models to solve the problem of record linkage (also known as record matching). Data quality has became a key aspect in many institutions and the demand for novel, effective techniques is increasing. Record linkage in general has been studied in the last three decades and a solid probabilistic decision framework has been propose...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002