Multidimensionality in Statistical, OLAP, and Scientific Databases

نویسنده

  • Arie Shoshani
چکیده

Multidimensionality in Statistical, OLAP, and Scientific Databases 47 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. INTRODUCTION AND BACKGROUND There is a lot of data that can be viewed as multidimensional data. The term multidimensional databases typically refers to a collection of objects, each represented as a point in a multidimensional space. Even data that is represented in a tabular form, such as relations, can be thought of as multidimensional data, if each row (tuple) is thought of as an object, and the columns (attributes) are thought of as the dimensions. For example, consider the following table: employee (personID, age, sex, salary) shown in Figure 1a. If each person is represented as a point in the multidimensional space of (age, sex, salary), then that table can be represented as in Figure 1b. The utility of representing data in the multidimensional space is that it is more natural to view certain features of the data in this way. For example, it is natural to view clusters in the multidimensional space. In Figure 1b, one can easily see that there is a small cluster of highly paid people (perhaps representing managers who are generally older) and a larger cluster of lower paid people. We can also see ìoutliersî as is the case with the younger person with a high salary. Of course, these concepts extends to data in more than three dimensions, but cannot be viewed as easily. The problem of viewing high-dimensional data to identify clusters, outliers, and various patterns has been the subject of several research projects. An extensive review of such methods is provided in Keim & Kriegel (1996) and will not be discussed further here. Some data is naturally multidimensional such as two-dimensional or threedimensional spatial data. For example, climate modelers prefer to view their observed or simulated data in a multidimensional structure representing space (two or three dimensions), time, and variables being measured (temperature, wind velocity, etc.) In this case, certain operations, such a selecting spatial regions or personID age sex salary

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-Driven Multidimensional Design for OLAP

OLAP is a popular technology to query scientific and statistical databases, but their success heavily depends on a proper design of the underlying multidimensional (MD) databases (i.e., based on the fact / dimension paradigm). Relevantly, different approaches to automatically identify facts are nowadays available, but all MD design methods rely on discovering functional dependencies (FDs) to id...

متن کامل

PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining

Multidimensional analysis and online analytical processing (OLAP) operations require summary information on multidimensional data sets. Most common are aggregate operations along one or more dimensions of numerical data values. Simultaneous calculation of multidimensional aggregates are provided by the Data Cube operator, used to calculate and store summary information on a number of dimensions...

متن کامل

Expressing OLAP Preferences

Multidimensional databases play a relevant role in statistical and scientific applications, as well as in business intelligence systems. Their users express complex OLAP queries, often returning huge volumes of facts, sometimes providing little or no information. Thus, expressing preferences could be highly valuable in this domain. The OLAP domain is representative of an unexplored class of pre...

متن کامل

High Performance Data Mining Using Data Cubes on Parallel Computers

On-Line Analytical Processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in knowledge discovery, OLAP calculations can be effectively used. For these, high performance parallel systems are required to provide interactive analysis. Precomputed aggregate calcu...

متن کامل

SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes

In this paper, we present SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing OLAP operations. On-Line Analytical Processing (OLAP) poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as multidimensionality, hierarchical structure of dimensions, data sparseness, etc., are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003