Comparing record linkage software programs and algorithms using real-world data
نویسندگان
چکیده
منابع مشابه
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records....
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملSummarization Algorithms for Record Linkage
Record linkage has received significant attention in recent years due to the plethora of data sources that have to be integrated to facilitate data analyses. In several cases, such an integration involves disparate data sources containing huge volumes of records and must be performed in near real-time in order to support critical applications. In this paper, we propose the first summarization a...
متن کاملA comparison of neural ICA algorithms using real-world data
In this paper, we compare the performance of ve prominent neural or semi-neural algorithms designed for Independent Component Analysis (ICA) using three diierent real-world data sets. The task is either to nd interesting directions in the data for visualisa-tion purposes or blind source separation. We develop criteria for selecting the most meaningful basis vectors of ICA or measuring the goodn...
متن کاملData Fusion with Record Linkage
Assuming that there are two sources (e.g. les), which consist of records with diierent informations about some units like people. We want to fusion the information (data) that belong to the same units. Very often in practice no identiication numbers | like the Social Security Number SSN | are available at both les, that's why there is some uncertainity, which records belong together. Anyway, we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: PLOS ONE
سال: 2019
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0221459