Class of fast methods for processing irregularly sampled or otherwise inhomogeneous one-dimensional data.
نویسندگان
چکیده
With the ansatz that a data set’s correlation matrix has a certain parameterized form (one general enough, however, to allow the arbitrary specification of a slowly-varying decorrelation distance and population variance) the general machinery of Wiener or optimal filtering can be reduced from O(n) to O(n) operations, where n is the size of the data set. The implied vast increase in computational speed can allow many common sub-optimal or heuristic data analysis methods to be replaced by fast, relatively sophisticated, statistical algorithms. Three examples are given: data rectification, highor lowpass filtering, and linear least squares fitting to a model with unaligned data points. PACS numbers: 06.50.-x, 02.50.Rj, 02.50.Vn Typeset using REVTEX 1 In a previous analysis of irregularly spaced observations of the gravitationally lensed quasar 0957+561 [1,2] we have used to good effect the full machinery of Wiener (or optimal) filtering in the time domain, including the use, where appropriate, of unbiased (“GaussMarkov”) estimators [3]. These, and related, techniques are general enough to be applicable to data that is not only irregularly sampled (including the notoriously common case of “gappy” data), but also to data that is highly inhomogeneous in its error bars, or with point-to-point correlations (so that errors are not independent). The principal reason that these methods are not better known and more widely used seems to be the fact that their use entails the numerical solution of sets of linear equations as large as the full data set. Only recently have fast workstations allowed application to data sets as large as a few hundred points; sets larger than a few thousand points are currently out of reach even on the largest supercomputers, since the computational burden for n data points scales as n. As an example, the analysis in [2] (leading to a measurement of the offset in time of the two radio images of the lensed quasar) required overnight runs on a fast workstation. In this context, we were therefore quite surprised recently to notice that the introduction of a particular simplifying assumption (essentially the ansatz of a certain parametrized form of the data’s correlation function) allows all the calculations already mentioned, and many more, to be done in linear time, that is, with only a handful of floating operations per data point. In fact we have verified that we are able to obtain results substantially identical to [2] in less than 2 seconds of computer time for ∼ 160 data points, about 10 times faster than the previous analysis. Speed increases of 10 or greater (that is, from O[n] to O[n] for n data points) are not merely computer time savers. Such increases are enabling for the application of sophisticated statistical techniques to data sets that hitherto have been analysed only by heuristic and ad-hoc methods. The Fast Fourier Transform (FFT) is a previous example of a numerical algorithm whose raw speed caused it to engender a considerable universe of sophisticated applications. By their nature, FFT methods are generally not applicable to irregularly sampled, or otherwise inhomogeneous, data sets (though see [4]). Although the methods 2 we describe here are not related to the FFT in a mathematical sense, we think they have the potential to be comparably significant in engendering new and powerful techniques of data analysis. In the interest of making such new methods available to the widest possible community, we outline, in this Letter, the mathematical foundation of the class, and give three examples of early applications. We are also making available, via the Internet [5], a “developer’s kit” of Fortran-90 code, fully implementing the examples given here. We begin with the observation that many, if not most, one-dimensional processes of interest (e.g., measurements as a function of time t) have a characteristic decorrelation time (which may itself vary with time), so that a set of measurements yi, i = 1, . . . , n at the ordered times t1 < t2 < · · · < tn, have an expected (population) correlation matrix Φij that is peaked on the diagonal i = j and decays away from the diagonal in both directions. We consider the case where this decay can be modeled, even if only roughly, by the form Φij ≡ ⎧⎪⎨ ⎪⎩ exp [ − ∫ tj ti w(t) dt ] , ti < tj, exp [ − ∫ ti tj w(t) dt ] , ti > tj, (1) where w(t), the reciprocal of the decorrelation time, can be thought of as slowly varying with time (or constant). Although this represents only one special type of correlation matrix, it can be applied to quite a large class of problems. All our results derive from the remarkable fact that the inverse of the matrix (1), Φ−1 ij ≡ Tij, is tridiagonal with Tij = ⎧⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎩ 1 + r1e1 i = j = 1 −ei 1 ≤ i = j − 1 ≤ n− 1 1 + riei + ri−1ei−1 1 < i = j < n −ej 1 ≤ j = i− 1 ≤ n− 1 1 + rn−1en−1 i = j = n 0 otherwise (2)
منابع مشابه
A Class of Fast Methods for Processing Irregularly Sampled or Otherwise Inhomogeneous One-Dimensional Data
With the ansatz that a data set’s correlation matrix has a certain parametrized form (one general enough, however, to allow the arbitrary specification of a slowly-varying decorrelation distance and population variance) the general machinery of Wiener or optimal filtering can be reduced from O(n) to O(n) operations, where n is the size of the data set. The implied vast increases in computationa...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملAn Adaptive Irregularly Spaced Fourier Method for Protein-Protein Docking
In this paper we introduce a grid free irregularly sampled Fourier approach for accurately predicting rigid body protein-protein docking sites. Of the many docking approaches, grid based Fast Fourier Transform (FFT) approaches have been shown to produce by far the fastest, correlation profiles of complex protein-protein interactions over the six dimensional search space. However, these uniform ...
متن کاملAn Efficient Algorithm for Optimal Multilevel Thresholding of Irregularly Sampled Histograms
Optimal multilevel thresholding is a quite important problem in image segmentation and pattern recognition. Although efficient algorithms have been proposed recently, they do not address the issue of irregularly sampled histograms. A polynomial-time algorithm for multilevel thresholding of irregularly sampled histograms is proposed. The algorithm is polynomial not just on the number of bins of ...
متن کاملA Higher Order B-Splines 1-D Finite Element Analysis of Lossy Dispersive Inhomogeneous Planar Layers
In this paper we propose an accurate and fast numerical method to obtain scattering fields from lossy dispersive inhomogeneous planar layers for both TE and TM polarizations. A new method is introduced to analyze lossy Inhomogeneous Planar Layers. In this method by applying spline based Galerkin’s method of moment to scalar wave equation and imposing boundary conditions we obtain reflection and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Physical review letters
دوره 74 7 شماره
صفحات -
تاریخ انتشار 1995