Modeling ETL activities as graphs
نویسندگان
چکیده
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the logical design of the ETL scenario of a data warehouse. Based on a formal logical model that includes the data stores, activities and their constituent parts, we model an ETL scenario as a graph, which we call the Architecture Graph. We model all the aforementioned entities as nodes and four different kinds of relationships (instance-of, part-of, regulator and provider relationships) as edges. In addition, we provide simple graph transformations that reduce the complexity of the graph. Finally, in order to support the engineering of the design and the evolution of the warehouse, we introduce specific importance metrics, namely dependence and responsibility, to measure the degree to which entities are bound to each other.
منابع مشابه
Graph-Based Modeling of ETL Activities with Multi-level Transformations and Updates
Extract-Transform-Load (ETL) workflows are data centric workflows responsible for transferring, cleaning, and loading data from their respective sources to the warehouse. In this paper, we build upon existing graph-based modeling techniques that treat ETL workflows as graphs by (a) extending the activity semantics to incorporate negation, aggregation and selfjoins, (b) complementing querying se...
متن کاملThe Conceptual Modeling of Etl Processes
An ETL process includes various ETL activities, such as filtering, aggregating, checking for null values, etc., which can be represented by the constraint functions and transforming operations defined in previous section. However, the activities cannot exist in an ETL process independently; they must be organized in certain order that is specified in an ETL task of the ETL process. We think tha...
متن کاملModeling and managing ETL processes
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. The design, development and deployment of ETL processes, which is currently, performed in an ad-hoc, in house fashion, needs modeling, design and methodological foundations. Unfortunately, the resear...
متن کاملQuality measures for ETL processes: from goals to implementation
ETL processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of Business Process Management, mainly focusing on functional requirements an...
متن کاملA Framework for ETL Systems Development
There are many commercial Extract-Transform-Load (ETL) tools, of which most of them do not offer an integrated platform for modeling processes and extending functionality. This drawback complicates the customization and integration with other applications, and consequently, many companies adopt internal development of their ETL systems. A possible solution is to create a framework to provide ex...
متن کامل