Using PROC GENMOD for Loglinear Smoothing
نویسندگان
چکیده
AND INTRODUCTION The goal of smoothing is to replace an observed frequency distribution with a distribution that preserves some features of the observed data without the irregularities that are attributable to sampling. The type of smoothing covered in this paper involves the fitting of loglinear, Poisson-based models to discrete distributions. Loglinear smoothing can preserve a variety of different features in observed data with a relatively small number of parameters. In this paper we use SAS/STAT® PROC GENMOD (SAS, 2002) to demonstrate the smoothing of univariate (one variable) and bivariate (two variables and one variable for separate subgroups) frequency distributions. For univariate distributions, we will produce smoothed distributions that preserve 1) the mean, 2) the mean and variance, 3) the mean, variance and skewness, and finally 4) the mean, variance, skewness and kurtosis in the observed distribution of one variable, X. For bivariate distributions, we will produce smoothed distributions that preserve three univariate moments in each of the marginal distributions of two variables, X and Y, as well as the correlation between X and Y. Finally, the incorporation of indicator functions is used to model overall and subset-specific features of distributions within the same overall model. LOGLINEAR SMOOTHING MODELS Assume we have a discrete random variable X with possible values x0,...,xJ , or xj, with j=0,...,J (the possible values), and a corresponding vector of observed frequencies n = (n0,...,nJ) t that sum to the total sample size, N. Under multinomial or Poisson distributional assumptions about n, the vector of the population probabilities p = (p0,...,pJ ) t is said to satisfy the following loglinear model: log ( ) e j j p u α = + + j b where the {pj} are assumed to be positive and sum to 1, bj is a row vector of known constants, is a vector of free parameters, uj is a known constant that specifies the distribution of the {pj} when the vector is set to zero, and is a normalizing constant that insures that the probabilities sum to one (Holland & Thayer, 1987; 2000). Throughout this paper u will be set to 0 so that the “null” model will be a uniform distribution where the frequencies for all j score values are equal to N/J. For the modeling of test score distributions, we write the loglinear model as: 1 log ( ) ( ) I i e j i j i p x α β
منابع مشابه
Model Fitting in PROC GENMOD
There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS system are familiar with procedures such as PROC REG and PROC GLM for fitting general linear models. However PROC GENMOD can handle these general linear models as well as more complex ones such as logistic models, loglinear models or models for count data. In addition, the main advantage ...
متن کاملAutomated forward selection for Generalized Linear Models with Categorical and Numerical Variables using PROC GENMOD
Generalized linear models are a powerful tool to measure relationships between variables, as they can handle nonnormal distributions without altering the properties of variables involved. When applied to risk factor analysis, they can help determine the most important factors contributing to the incidence, prevalence or acquisition of a particular medical condition. This paper presents a partic...
متن کاملWhy We Need an R Measure of Fit (and Not Only One) in Proc Logistic and Proc Genmod
We propose to use two seemingly different R measures of fit in PROC LOGISTIC and PROC GENMOD (SAS/STAT), and we show that they are closely related to each other in terms of the amount of information gained when including predictors, in comparison with the “null” model. We suggest working with these R measures simultaneously rather than separately because they can be seen as two faces of the sam...
متن کاملSUGI 28: Estimation of Prevalence Ratios When PROC GENMOD Does Not Converge
When studying a prevalent outcome, it is often of interest to estimate the prevalence ratio instead of the odds ratio. In SAS one can use PROC GENMOD with the binomial distribution and the log link function. Unlike the logistic model, the log-binomial model places restrictions on the parameter space, and the maximum likelihood estimate (MLE) might occur on the boundary of the parameter space, i...
متن کاملThe “ Handy - Dandy , Quick - n - Dirty ” Automated Contrast Generator - A SAS / IML R © Macro to Support the GLM , MIXED , and GENMOD Procedures
Contrasts are an important component of the armamentarium of the statistician. In the SAS/STAT R © GLM, ANOVA, MIXED, and GENMOD procedures, the contrasts are used to answer specific additional questions. In many cases, it is difficult to define contrasts which are estimable, or correctly formed. A macro which converts a question about differences between cells (defined in several ways) into es...
متن کامل