نتایج جستجو برای: data cleaning

تعداد نتایج: 2424654  

Journal: :international journal of smart electrical engineering 2014
ramin rahnamoun

typographical data entry errors and incomplete documents, produce imperfect records in real world databases. these errors generate distinct records which belong to the same entity. the aim of approximate record matching is to find multiple records which belong to an entity. in this paper, an algorithm for approximate record matching is proposed that can be adapted automatically with input error...

2014
Jiannan Wang

With the increasing amount of available data, turning raw data into actionable information is a requirement in every field. However, one bottleneck that impedes the process is data cleaning. Data analysts usually spend over half of their time cleaning data that is dirty — inconsistent, inaccurate, missing, and so on — before they even begin to do any real analysis. It is a time consuming and co...

Journal: :Journal of Statistical Software 2004

Journal: :CFA Institute Magazine 2016

Journal: :The Programming Historian 2013

Journal: :CoRR 2017
El Kindi Rezig Mourad Ouzzani Walid G. Aref Ahmed K. Elmagarmid Ahmed R. Mahmood

Data is inherently dirty and there has been a sustained effort to come up with different approaches to clean it. A large class of data repair algorithms rely on data-quality rules and integrity constraints to detect and repair the data. A well-studied class of integrity constraints is Functional Dependencies (FDs, for short) that specify dependencies among attributes in a relation. In this pape...

2003
William E Winkler

Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for automatically estimating optimal parameters without training data that we extend to many real world sit...

Journal: :PVLDB 2016
Xu Chu Ihab F. Ilyas

Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation script...

Journal: :Genetic epidemiology 1999
K W Broman

The identification of genes contributing to variation in complex phenotypes requires genetic data of high fidelity. Thus, the identification of pedigree and genotyping errors is a crucial prerequisite to the analysis of data from a genome scan for disease genes. The problem has been given little attention in most gene hunting papers; the focus has often been on eliminating mendelian inconsisten...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید