Misclassified group-tested current status data
نویسندگان
چکیده
Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of [Formula: see text] groups that include [Formula: see text] independent individuals in total. If the unknown prevalence is low and the screening test suffers from misclassification, it is also possible to obtain more precise prevalence estimates than those obtained from testing all [Formula: see text] samples separately (Tu et al., 1994). In some applications, the individual binary response corresponds to whether an underlying time-to-event variable [Formula: see text] is less than an observed screening time [Formula: see text], a data structure known as current status data. Given sufficient variation in the observed [Formula: see text] values, it is possible to estimate the distribution function [Formula: see text] of [Formula: see text] nonparametrically, at least at some points in its support, using the pool-adjacent-violators algorithm (Ayer et al., 1955). Here, we consider nonparametric estimation of [Formula: see text] based on group-tested current status data for groups of size [Formula: see text] where the group tests positive if and only if any individual's unobserved [Formula: see text] is less than the corresponding observed [Formula: see text]. We investigate the performance of the group-based estimator as compared to the individual test nonparametric maximum likelihood estimator, and show that the former can be more precise in the presence of misclassification for low values of [Formula: see text]. Potential applications include testing for the presence of various diseases in pooled samples where interest focuses on the age-at-incidence distribution rather than overall prevalence. We apply this estimator to the age-at-incidence curve for hepatitis C infection in a sample of U.S. women who gave birth to a child in 2014, where group assignment is done at random and based on maternal age. We discuss connections to other work in the literature, as well as potential extensions.
منابع مشابه
Regression analysis with a misclassified covariate from a current status observation scheme.
Naive use of misclassified covariates leads to inconsistent estimators of covariate effects in regression models. A variety of methods have been proposed to address this problem including likelihood, pseudo-likelihood, estimating equation methods, and Bayesian methods, with all of these methods typically requiring either internal or external validation samples or replication studies. We conside...
متن کاملBinary Regression With a Misclassified Response Variable in Diabetes Data
Objectives: The categorical data analysis is very important in statistics and medical sciences. When the binary response variable is misclassified, the results of fitting the model will be biased in estimating adjusted odds ratios. The present study aimed to use a method to detect and correct misclassification error in the response variable of Type 2 Diabetes Mellitus (T2DM), applying binary ...
متن کاملParameter Identifiability Issues in a Latent Ma- rkov Model for Misclassified Binary Responses
Medical researchers may be interested in disease processes that are not directly observable. Imperfect diagnostic tests may be used repeatedly to monitor the condition of a patient in the absence of a gold standard. We consider parameter identifiability and estimability in a Markov model for alternating binary longitudinal responses that may be misclassified. Exactly ...
متن کاملThe Effects of Initially Misclassified Data on the Effectiveness of Discriminant Function Analysis and Finite Mixture Modeling
Classification procedures are common and useful in behavioral, educational, social, and managerial research. Supervised classification techniques such as discriminant function analysis assume training data are perfectly classified when estimating parameters or classifying. In contrast, unsupervised classification techniques such as finite mixture models (FMM) do not require, or even use if avai...
متن کاملOutlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کامل