Data-driven probability concentration and sampling on manifold
نویسندگان
چکیده
A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a-priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation method for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems. ∗Corresponding author: C. Soize, [email protected] Email addresses: [email protected] (C. Soize ), [email protected] (R. Ghanem) Preprint submitted to Journal of Computational Physics, accepted 28 May 2016 May 31, 2016
منابع مشابه
Sampling From A Manifold
A variety of inferential tasks require drawing samples from a probability distribution on a manifold. This occurs in sampling from the posterior distribution on constrained parameter spaces (eg covariance matrices), in testing goodness of fit for exponential families conditional on sufficient statistics (eg the sum and product of the observations in a Gamma family), and in generating data to te...
متن کاملStudy of factors affecting on neonatal hyperbilirubinemia according of optimization in logistic regression model.
Aim and Back ground: Neonatal hyperbilirubinemia is most common reason to re-admission in hospital. The aim of this study is to investigate effect of risk factors such as hypertension, age and type of delivery in mothers on neonatal hyperbilirubinemia based on logistic regression model. Method and material: In this descriptive study, the 300 mother's documents which refer to hospital for hospit...
متن کاملTangent Space Estimation for Smooth Embeddings of Riemannian Manifolds
Numerous dimensionality reduction problems in data analysis involve the recovery of lowdimensional models or the learning of manifolds underlying sets of data. Many manifold learning methods require the estimation of the tangent space of the manifold at a point from locally available data samples. Local sampling conditions such as (i) the size of the neighborhood (sampling width) and (ii) the n...
متن کاملنمونهگیری پاسخگو محور در مقایسه با سایر روشهای نمونهگیری از جوامع پنهان
Sampling hidden populations is challenging due to the lack of convenience statistical frames. Since most populations exposed to special diseases are hidden and hard to reach, sampling methods that produce representative and efficient samples from the populations have become a study subject for researches all over the world. Because of the unknown probability of selecting samples in conventional...
متن کاملData augmentation for models based on rejection sampling
We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea is a simple scheme to instantiate the rejected proposals preceding each data point. The resulting joint probability over observed and rejected variables can be much simpler than the marginal distribution over the observed variable...
متن کاملSimulation of the Matrix Bingham–von Mises–Fisher Distribution, With Applications to Multivariate and Relational Data
Orthonormal matrices play an important role in reduced-rank matrix approximations and the analysis of matrix-valued data. A matrix Bingham-von Mises-Fisher distribution is a probability distribution on the set of orthonormal matrices that includes linear and quadratic terms, and arises as a posterior distribution in latent factor models for multivariate and relational data. This article describ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Comput. Physics
دوره 321 شماره
صفحات -
تاریخ انتشار 2016