Learning Classifiers from Distributed Data Sources

نویسندگان

  • Doina Caragea
  • Vasant Honavar
چکیده

Recent development of high throughput data acquisition technologies in a number of domains (e.g., biological sciences, atmospheric sciences, space sciences, commerce) together with advances in digital storage, computing, and communications technologies have resulted in the proliferation of a multitude of physically distributed data repositories created and maintained by autonomous entities (e.g., scientists, organizations). The resulting increasingly data-rich domains offer unprecedented opportunities in computer assisted data-driven knowledge acquisition in a number of applications, including, in particular, data-driven scientific discovery, data-driven decision-making in business and commerce, monitoring and control of complex systems, and security informatics. Machine learning (Duda, Hart & Stork, 2000; Mitchell, 1997) offers one of the most cost-effective approaches to analyzing, exploring, and extracting knowledge (i.e., features, correlations, and other complex relationships and hypotheses that describe potentially interesting regularities) from data. However, the applicability of current machine learning approaches in emerging datarich applications is severely limited by a number of factors:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Link-Based Näıve Bayes Classifiers from Ontology-Extended Distributed Data

We address the problem of learning predictive models from multiple large, distributed, autonomous, and hence almost invariably semantically disparate, relational data sources from a user’s point of view. We show under fairly general assumptions, how to exploit data sources annotated with relevant meta data in building predictive models (e.g., classifiers) from a collection of distributed relati...

متن کامل

Learning Relational Bayesian Classifiers on the Semantic Web

With the advent of the Semantic Web, there is an increased availability of meta data (ontologies) that make explicit the semantic commitments associated with data and an urgent need for machine learning algorithms for building predictive models from such data. Usually, there is no unique global interpretation of data from semantically disparate, autonomous sources. Furthermore, it is neither fe...

متن کامل

Learning Classifiers from Distributed, Ontology-Extended Data Sources

There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontolog...

متن کامل

Learning Support Vector Machines from Distributed Data Sources

In this paper we address the problem of learning Support Vector Machine (SVM) classifiers from distributed data sources. We identify sufficient statistics for learning SVMs and present an algorithm that learns SVMs from distributed data by iteratively computing the set of sufficient statistics. We prove that our algorithm is exact with respect to its centralized counterpart and efficient in ter...

متن کامل

Learning classifiers from distributed, semantically heterogeneous, autonomous data sources

Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities for large-scale data-driven knowledge acquisition with the potential for fundamental gains in sci...

متن کامل

Learning Classifiers from Semantically Heterogeneous Data

Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009