Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data

نویسندگان

چکیده

Evidence-based decision making often relies on meta-analyzing multiple studies, which enables more precise estimation and investigation of generalizability. Integrative analysis heterogeneous studies is, however, highly challenging in the ultra high dimensional setting. The challenge is even pronounced when individual level data cannot be shared across known as DataSHIELD constraint (Wolfson et al., 2010). Under sparse regression models that are assumed to similar yet not identical we propose this paper a novel integrative procedure for data-Shielding High-dimensional Regression (SHIR). SHIR protects through summary-statistics-based integrating procedure, accommodates between study heterogeneity both covariate distribution model parameters, attains consistent variable selection. Theoretically, statistically efficient than existing distributed approaches integrate debiased LASSO estimators from local sites. Furthermore, error incurred by aggregating derived negligible compared statistical minimax rate shown asymptotically equivalent ideal estimator obtained sharing all data. finite-sample performance our method studied with via extensive simulation settings. We further illustrate utility derive phenotyping algorithms coronary artery disease using electronic health records chronic cohorts.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data

MOTIVATION Analyzing data from multi-platform genomics experiments combined with patients' clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current data integration approaches are limited in that they do not consider the fundamental biological relationships that exist amon...

متن کامل

Integrative Bayesian Analysis of High-Dimensional Multi-Platform Genomics Data

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Bayesian models for sparse regression analysis of high dimensional data

This paper considers the task of building efficient regression models for sparse multivariate analysis of high dimensional data sets, in particular it focuses on cases where the numbers q of responses Y = (y k , 1 ≤ k ≤ q) and p of predictors X = (xj , 1 ≤ j ≤ p) to analyse jointly are both large with respect to the sample size n, a challenging bi-directional task. The analysis of such data set...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the American Statistical Association

سال: 2021

ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']

DOI: https://doi.org/10.1080/01621459.2021.1904958