Learning from Semantically Heterogeneous Data
نویسنده
چکیده
Advances in the Semantic Web technologies present unprecedented opportunities for exploiting multiple related data sources to discover useful knowledge in many application domains. We have precisely formulated the problem of learning classifiers from a collection of several related ontology extended data sources, which make explicit (the typically implicit) ontologies associated with the data sources of interest, and have presented a solution to this problem. Userspecific mappings between a user ontology and data source ontologies are used to answer statistical queries that provide the sufficient statistics needed for learning classifiers from semantically heterogeneous data.
منابع مشابه
Learning Classifiers from Semantically Heterogeneous Data
Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no si...
متن کاملLearning Relational Bayesian Classifiers on the Semantic Web
With the advent of the Semantic Web, there is an increased availability of meta data (ontologies) that make explicit the semantic commitments associated with data and an urgent need for machine learning algorithms for building predictive models from such data. Usually, there is no unique global interpretation of data from semantically disparate, autonomous sources. Furthermore, it is neither fe...
متن کاملInformation Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources
We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal...
متن کاملKnowledge Discovery from Disparate Earth Data Sources
Advances in data collection and data storage technologies have made it possible to acquire massive Earth science data sets. In principle, these data sets could be transformed into great scientific discoveries. However, due to the heterogeneous nature and to the scale of the available Earth science data, traditional analysis methods are challenged and much of these data remain largely unexplored...
متن کاملSemi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks
Heterogeneous information network (HIN) is a general representation of many real world data. The difference between HIN and traditional homogeneous network is that the nodes and edges in HIN are with types. In many applications, we need to consider the types to make the decision more semantically meaningful. For annotationexpensive applications, a natural way is to consider semi-supervised lear...
متن کامل