Proc Visuals: ~xperimental Sas Software for Dynamic Hyperdimensional Graphics
نویسنده
چکیده
PROC VISUALS is designed to help you visually discover and formulate hypotheses about structure in multivariate data. PROC VISUALS does this by constructing a window into your multivariate data. Through the window you see a three-dimensional (3D) cloud of objects which represent your multivariate observations. You can smoothly spin this cloud in order to better understand its 3D structure. You can also move the window throughout the high-dimensional (hD) space, giving you insight into your data's hD 'structure. You do not use numbers, equations, or programming statements to control PROC VISUALS. Rather, you use single key-stroke, highly interactive commands. In addition to introducing PROC VISUALS, we review previous visual exploratory data analysis techniques, and we suggest future directions. 1. VISUAL EXPLORATORY DATA ANALYSIS The course of scientific investigation can be classified into two broad stages: the exploratory stage, where the investigator forms hypotheses; and the confirmatory stage, where the investigator tests hypotheses. For many years statisticians focused on developing and improving inferential methods for hypothesis testing, to the neglect of exploratory methods for hypothesis formation. However, in recent years there have been many new developments in exploratory data analysis (EDA) methods. Tukey, in his landmark book ExploratdryData Analysis (1977, p. v) states that EDA "is about looking at data to see what it seems to say. It concentrates on simple arithmetic and easy-to-draw pictures. It regards whatever appearances we have recognized as partial descriptions, and tries to look beneath them for new insights. Its concern is with appearance, not with confirmation." The work reported here falls under the general rubric of "exploratory data analysis," and is guided by the philosophy of "looking at data to see what it seems to say." In this paper we present a set of ideas, which we call VEDA (Visual Exploratory Data Analysis), and PROC VISUALS, a SAS l procedure that implements some of these ideas. Our work differs from Tukey's EDA in two ways. First, our work is entirely graphic; it emphasizes describing data visually. Second, our work is primarily multivariate; it emphasizes presenting multidimensional information on the two dimensional computer screen. VEDA capitalizes on the pattern recognition power of human vision and the computational power of graphics workstations to help data analysts look for structure (form hypotheses) that may be hidden in their multivariate data. The goal of VEDA is to aid in forming hypotheses about the data's hyperdimensional (hD) structure, even though we can only see in 3D. To do this, the graphic representation must > respect the data's hyper-dimensional geometry, > respect the user's three-dimensional perception, > respect the workstation's computational limits. Of course, while we see in 3D, we can only draw in 2D, either on paper or on the computer screen. The problem that all VEDA methods tackle is presenting hD information in a 2D plane, such that our 3D perception can understand the hD geometry. Presenting 3D in a 2D plane is not new, of course. Artists have done this for centuries, statisticians for over a centruy, and computers for two decades. Indeed, techniques for pres~nting hD in a 2D plane are not new. Tufte (1983, p. 40) presents a marvelous example, dating from 1861, of a 2D statistical graphic that incorporates 6 dimensions of information concerning the fate of Napoleon's army in Russia. There are sophisticated computer techniques that create images that appear to genuinely occupy 3D volume. These techniques do a very convincing job of tricking us into "seeing 3D." Ordinary computer generated 2D printer plots can have additional "dimensions" added by labeling the points in the 3D space. Kuhfeld (1986) has developed PROC IDPLOT, a SAS procedure which incorporates several methods of maximizing the amount of information that can be presented in point labels. Also, with common computer generated color graphics (such as those in SAS/GRAPH), more dimensions may be added by using various colors and shapes to distinguish the points on discrete dimensions. In addition, there are a variety of techniques for communicating 3D and hD on a 2D plane, including perspective and stereo projections, movement, dynamically changing object shapes, etc.
منابع مشابه
Instant KPI: From Data to Dashboard in Record Time
Moving from data to dashboard in a single program, this paper shows how Base SAS® and SAS/STAT® software can be used with the power of PROC GKPI to go from raw data to meaningful statistical summaries of benchmarked performance. Benchmarking is a common process by which organizations measure performance against group norms, past performance, or expected performance. For example, a school might ...
متن کامل191-2007: Model Selection in PROC MIXED—A User-Friendly SAS® Macro Application
A user-friendly SAS macro application to perform all possible model selection of fixed effects including quadratic and cross products within a user-specified subset range in the presence of random and repeated measures effects using SAS PROC MIXED is available. This macro application, ALLMIXED2 will complement the model selection option currently available in the SAS PROC REG for multiple linea...
متن کاملA Visualization Framework for the Analysis of Hyperdimensional Data
The purpose of this article is to describe a new visualization framework for the analysis of hyperdimensional data. This framework was developed in order to facilitate the study of a new class of classifiers designated class cover catch digraphs. The class cover catch digraph is an original random graph technique for the construction of classifiers on high dimensional data. This framework allow...
متن کاملMeta-Analysis Using SAS PROC MIXED 1 META-ANALYSIS USING LINEAR MIXED MODELS
Psychologists often use special computer programs to perform meta-analysis. Until recently, this had been necessary because standard statistical packages did not provide procedures for such analysis. This paper introduces linear mixed models as a framework for meta-analysis in psychological research, using a popular general-purpose statistical package, SAS. The approach is illustrated with thre...
متن کاملData Merging and Visualization to Identify Associations between Environmental Factors and Disease Outbreaks
This paper describes data merging and visualization techniques for epidemiological and environmental surveillance data. The ultimate goal is to learn about associations between specific environmental factors and disease outbreaks. In such studies, environmental and clinical surveys often occur on different timelines. As such, data merging for the purpose of correlating the two data series can b...
متن کامل