Re-using Data Mining Workflows
نویسندگان
چکیده
Setting up and reusing data mining processes is a complex task. Based on our experience from a project on the analysis of clinicogenomic data we will make the point that supporting the setup and reuse by setting up large workflow repositories may not be realistic in practice. We describe an approach for automatically collecting workflow information and meta data and introduce data mining patterns as an approach for formally describing the necessary information for workflow reuse.
منابع مشابه
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery SoKD’10
Knowledge Discovery in Databases (KDD) has grown a lot during the last years. But providing user support for constructing workflows is still problematic. The large number of operators available in current KDD systems makes it difficult for a user to successfully solve her task. Also, workflows can easily reach a huge number of operators (hundreds) and parts of the workflows are applied several ...
متن کاملData Mining Workflow Templates for Intelligent Discovery Assistance and Auto-Experimentation
Knowledge Discovery in Databases (KDD) has grown a lot during the last years. But providing user support for constructing workflows is still problematic. The large number of operators available in current KDD systems makes it difficult for a user to successfully solve her task. Also, workflows can easily reach a huge number of operators(hundreds) and parts of the workflows are applied several t...
متن کاملUsing Meta-mining to Support Data Mining Workflow Planning and Optimization
Knowledge Discovery in Databases is a complex process that involves many different data processing and learning operators. Today’s Knowledge Discovery Support Systems can contain several hundred operators. A major challenge is to assist the user in designing workflows which are not only valid but also – ideally – optimize some performance measure associated with the user goal. In this paper we ...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملAnalysis and Design of Service-Oriented Framework for Executing Data Mining Services on Grids
Data mining services on grids is the need of today’s era. Workflow environments are widely used in data mining systems to manage data and execution flows associated to complex applications. Weka, one of the most used open-source data mining systems, includes the Knowledge-Flow environment which provides a drag-and-drop inter-face to compose and execute data mining workflows. It allows users to ...
متن کامل