Weighting Imputation for Categorical Data

نویسندگان

  • Liang-Ting Tsai
  • Chih-Chien Yang
چکیده

LVQ (Learning Vector Quantization) has been used to impute missing group membership and stratum weights in confirmatory factor analysis (CFA) model with continuous indicators (Chen, Tsai, & Yang, 2010; Tsai & Yang, 2012). Currently, categorical questionnaires (e.g., Binary and Likert-type items) are widely used in education, business, economy, and psychology tests as well as international large-scale surveys (e.g., Trend in International Mathematics and Science Study, TIMSS; Progress in International Reading Literacy Study, PIRLS; Program for International Students Assessment, PISA; German Survey of Income and Expenditure, SIE; British Labour Force Survey, LFS). This article aims to adapt the LVQ approach to assess the accuracy of parameters in a CFA model with missing background information in binary and Likert-type questionnaires through a series of simulations. Questionnaires utilizing categorical and binary items are widely used in business tests and largescale international surveys. In addition to the responses taken from the items included in the questionnaire, databases used for the analysis of questionnaire results also often provide weighting factors to compensate for non-response bias. This information can be utilized to produce estimates at the level of the population. However, weighting factors in such surveys are unable to consider all the background variables which may affect population level estimates. For example, in the LFS survey, the weight allocated to each individual to better ensure that the respondents were representative of the population was calculated based on age, sex, and region of residence alone (Office for National Statistics, 2011). However, while the researchers conducting the LFS were interested in the relationship between income and economic activity, the survey database did not provide a weighting factor for participant income. Without this weighting factor, a bias would have been introduced on account of the large number of subjects with missing incomes. This type of non-response bias is frequently encountered in the analysis of large-scale questionnaire data, however, to the best of our knowledge no method has been proposed in the literature to account for it. Therefore, to better compensate for this bias and provide more accurate population level estimates, the current study applied the LVQ method to calculate weighing factors for variables of interests. The concept of sampling weights and the practical applications of survey data have gradually gained importance in advanced statistical models (e.g., CFA, structural equation modeling, multilevel modeling; latent class analysis; latent growth model) (Asparouhov, 2005, 2006; Grilli & Pratesi, 2004; Kaplan & Ferguson, 1999; Patterson, Dayton, & Graubard, 2002; Stapleton, 2002, 2006, 2008; Sonnenschein, Stapleton, & Benson, 2010; Tsai & Yang, 2008; Yang & Tsai, 2006, 2008). To achieve effective results from the analysis of survey data, the analyst needs to adopt proper sampling weights for calculating paramLiang-Ting Tsai National Taichung University of Education, Taiwan

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nearest Neighbor Imputation for Categorical Data by Weighting of Attributes

Missing values are a common phenomenon in all areas of applied research. While various imputation methods are available for metrically scaled variables, methods for categorical data are scarce. An imputation method that has been shown to work well for high dimensional metrically scaled variables is the imputation by nearest neighbor methods. In this paper, we extend the weighted nearest neighbo...

متن کامل

Missing-Values Adjustment for Mixed-Type Data

We propose a new method of single imputation, reconstruction, and estimation of nonreported, incorrect, implausible, or excluded values in more than one field of the record. In particular, we will be concerned with data sets involving a mixture of numeric, ordinal, binary, and categorical variables. Our technique is a variation of the popular nearest neighbor hot deck imputation NNHDI where “ne...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

On the Returns to Occupational Qualification in Terms of Subjective and Objective Variables: A GEE-type Approach to the Estimation of Two-Equation Panel Models

This article proposes an estimation approach for panel models with mixed continuous and ordered categorical outcomes based on generalized estimating equations for the mean and pseudo-score equations for the covariance parameters. A numerical study suggests that efficiency can be gained as concerns the mean parameter estimators by using individual covariance matrices in the estimating equations ...

متن کامل

A nonparametric multiple imputation approach for missing categorical data

BACKGROUND Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. METHODS We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each catego...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016