Uncertain Data Integration with Probabilities

نویسندگان

  • Gayatri Tallur
  • Fereidoon Sadri
  • Jing Deng
چکیده

Real world applications that deal with information extraction, such as business intelligence software or sensor data management, must often process data provided with varying degrees of uncertainty. Uncertainty can result from multiple or inconsistent sources, as well as approximate schema mappings. Modeling, managing and integrating uncertain data from multiple sources has been an active area of research in recent years [6][7][1][2]. In particular, data integration systems free the user from the tedious tasks of finding relevant data sources, interacting with each source in isolation using its corresponding interface and combining data from multiple sources by providing a uniform query interface to gain access to the integrated information [5]. Previous work has integrated uncertain data using representation models such as the possible worlds and probabilistic relations [12][1][2]. We extend this work by determining the probabilities of possible worlds of an extended probabilistic relation. We also present an algorithm to determine when a given extended probabilistic relation can be obtained by the integration of two probabilistic relations and give the decomposed pairs of probabilistic relations. iii ACKNOWLEDGEMENTS I sincerely thank my advisor, Dr. Fereidoon Sadri, for his abundant guidance and support throughout the course of this research without which this thesis would not have been possible. I am very thankful to him for giving me the opportunity to work with him and believing in me. I thoroughly enjoyed working under him. I would also like to thank Dr. Jing Deng and Dr. Nancy Green for their valuable guidance and feedback. I am indebted to my husband for his constant encouragement, support and love. I am very grateful to my mother and my family members for their unconditional support and care. I want to thank Nina Revankar and her family for their ample love and concern during my study. Nina's spontaneous gestures of help on those busy days really made a big difference, and I am deeply indebted to her for it. I would like to express my gratitude to my friends at School for cheering me up and encouraging me all through. I am highly grateful to all my dear friends in Greensboro for keeping my life outside School fun at all times, and for their endless concern and support throughout.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Data Integration Systems

Current data integration techniques are successful at managing well-defined and wellunderstood data integration tasks, but do not cope well with uncertainty. However, the amount of uncertain data is growing with the number and variety of data sources being integrated, both in traditional data integration tasks s.a. enterprise data integration, and in next generation integration problems, s.a. c...

متن کامل

Identifying Interesting Instances for Probabilistic Skylines

Uncertain data arises from various applications such as sensor networks, scientific data management, data integration, and location based applications. While significant research efforts have been dedicated to modeling, managing and querying uncertain data, advanced analysis of uncertain data is still in its early stages. In this paper, we focus on skyline analysis of uncertain data, modeled as...

متن کامل

Indexing Probabilistic Nearest-Neighbor Threshold Queries

Data uncertainty is inherent in many applications, including sensor networks, scientific data management, data integration, locationbased applications, etc. One of common queries for uncertain data is the probabilistic nearest neighbor (PNN) query that returns all uncertain objects with non-zero probabilities to be NN. In this paper we study the PNN query with a probability threshold (PNNT), wh...

متن کامل

Probabilistic Local Features in Uncertain Vector Fields with Spatial Correlation

In this paper methods for extraction of local features in crisp vector fields are extended to uncertain fields. While in a crisp field local features are either present or absent at some location, in an uncertain field they are present with some probability. We model sampled uncertain vector fields by discrete Gaussian random fields with empirically estimated spatial correlations. The variabili...

متن کامل

Optimizing Probabilistic Query Processing on Continuous Uncertain Data

Uncertain data management is becoming increasingly important in many applications, in particular, in scientific databases and data stream systems. Uncertain data in these new environments is naturally modeled by continuous random variables. An important class of queries uses complex selection and join predicates and requires query answers to be returned if their existence probabilities pass a t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013