208-2012: How Test Length and Sample Size Have an Impact on the Standard Errors for IRT True Score Equating: Integrating SAS® and Other Software
نویسنده
چکیده
The standard error of equating is a useful index to quantify the amount of equating error. It is the standard deviation of equated scores over replications of an equating procedure in samples from a population or populations of examines. The current study estimates the SE of item response theory true score equating in the Nonequivalent Groups with Anchor Test design using simulations. Specifically, the test length of the internal anchor and the sample size are of interests. Some specialized programs, such as BILOG-MG 3.0 for item calibration, ST for IRT scale transformations, and PIE for IRT true score equating, are incorporated to accomplish the equatings. The purpose of the paper is to demonstrate such a complicated and repetitive procedure through SAS®.
منابع مشابه
Standard Error Estimation of 3PL IRT True Score Equating With an MCMCMethod
A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tauequivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of Basic Skills1. For parallel and congeneric test form...
متن کاملExamining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating
As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field. To obtain a PDF or a print copy of a report, please visit: Abstract In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often ...
متن کاملEffectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design
The purpose of this study was to evaluate the effectiveness of the hybrid Levine equipercentile (Hybrid LE) and modified frequency estimation (MFE) equating methods in improving accuracy of equating as compared to the percentile rank frequency estimation (FE), kernel frequency estimation (Kernel FE) and percentile rank chained equipercentile (CE) equating methods under the common-item nonequiva...
متن کاملA comparison of Van der Linden's conditional equipercentile equating method with other equating methods under the random groups design
To ensure test security and fairness, alternative forms of the same test are administered in practice. However, alternative forms of the same test generally do not have the same test difficulty level, even though alternative test forms are designed to be as parallel as possible. Equating adjusts for differences in difficulties among forms of the test. Six traditional equating methods are consid...
متن کاملSampling of Common Items: an Unrecognized Source of Error in Test Equating1
There is variability in the estimation of an equating transformation because commonitem parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a similar way of reasoning, the common items that are emb...
متن کامل