A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data
نویسندگان
چکیده
Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., binary, nominal or numerical features associated with an instance, while Linked Open Data sources are usually graphs by nature. In this paper, we compare different strategies for creating propositional features from Linked Open Data (a process called propositionalization), and present experiments on different tasks, i.e., classification, regression, and outlier detection. We show that the choice of the strategy can have a strong influence on the results.
منابع مشابه
Towards Linked Open Data Enabled Data Mining - Strategies for Feature Generation, Propositionalization, Selection, and Consolidation
Background knowledge from Linked Open Data sources can be used to improve the results of a data mining problem at hand: predictive models can become more accurate, and descriptive models can reveal more interesting findings. However, collecting and integrating background knowledge is a tedious manual work. In this paper we propose a set of desiderata, and identify the challenges for developing ...
متن کاملBinary Vector based Propositionalization Strategy for Multivalued Relations in Linked Data
Machine learning on linked data is strongly dependent on the selection of high quality data features to achieve good results and build reusable and generalizable models. In this work, we explore the problem of representing multivalued relations in a suitable form for machine learning while keeping the human comprehensibility of the resulting model. Specifically, we propose the use of a binary v...
متن کاملRDF2Vec: RDF Graph Embeddings for Data Mining
Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsu...
متن کاملDesign Stategies for Boys’ Preschools in Isfahan with the Aim of Creating Place Attachment
One of the first and most important public spaces with which children deal is the educational space and the most basic spaces are preschools. Childrenchr('39')s belonging to preschoolers affects their future to enhance their education. Many studies have been conducted in Iran on the creation and upgrading of educational spaces for children, but studies on children as the main source of qualitat...
متن کاملOn propositionalization for knowledge discovery in relational databases
Propositionalization is a process that leads from relational data and background knowledge to a single-table representation thereof, which serves as the input to widespread systems for knowledge discovery in databases. Systems for propositionalization thus support the analyst during the usually costly phase of data preparation for data mining. Such systems have been applied for more than 15 yea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014