A Distribution Function Arising in Computational Biology
نویسندگان
چکیده
Karlin and Altschul in their statistical analysis for multiple highscoring segments in molecular sequences introduced a distribution function which gives the probability there are at least r distinct and consistently ordered segment pairs all with score at least x. For long sequences this distribution can be expressed in terms of the distribution of the length of the longest increasing subsequence in a random permutation. Within the past few years, this last quantity has been extensively studied in the mathematics literature. The purpose of this note is to summarize these new mathematical developments in a form suitable for use in computational biology. Dedicated to Barry McCoy on the occasion of his sixtieth birthday. 1 The Distribution Function Karlin and Altschul [8] in their statistical analysis for multiple high-scoring segments in molecular sequences, introduced the following distribution function: Let F (r; y) denote the probability that there are at least r distinct and consistently ordered segment pairs all with score at least x. They further introduced a parameter y = KNe−λx where K and λ are parameters related to the scoring system, see [8] for details. We use the parameter y without further reference to x. For long sequences (N → ∞) this distribution function is well approximated by [8] F (r; y) = e−y ∞
منابع مشابه
استفاده از رگولاریزاسیون خطی برای پیشبینی توابع توزیع دارای چند پیک در جاذبهای ناهمگن
In the present article an energy distribution function of heterogeneous solid was estimated. Energy distribution function is an important characterization for heterogeneous adsorbent. An overall adsorption quantity for a heterogeneous solid is usually expressed by a first kind of Fredholm equation, which contains unknown distribution function and local adsorption isotherm as a kernel. The calcu...
متن کاملA Bayesian approach for image denoising in MRI
Magnetic Resonance Imaging (MRI) is a notable medical imaging technique that is based on Nuclear Magnetic Resonance (NMR). MRI is a safe imaging method with high contrast between soft tissues, which made it the most popular imaging technique in clinical applications. MR Imagechr('39')s visual quality plays a vital role in medical diagnostics that can be severely corrupted by existing noise duri...
متن کاملHigher Order Moments and Recurrence Relations of Order Statistics from the Exponentiated Gamma Distribution
Order statistics arising from exponentiated gamma (EG) distribution are considered. Closed from expressions for the single and double moments of order statistics are derived. Measures of skewness and kurtosis of the probability density function of the rth order statistic for different choices of r, n and /theta are presented. Recurrence relations between single and double moments of r...
متن کاملBounds for CDFs of Order Statistics Arising from INID Random Variables
In recent decades, studying order statistics arising from independent and not necessary identically distributed (INID) random variables has been a main concern for researchers. A cumulative distribution function (CDF) of these random variables (Fi:n) is a complex manipulating, long time consuming and a software-intensive tool that takes more and more times. Therefore, obtaining approximations a...
متن کاملOn a Distribution Function Arising in Computational Biology
where Rk,r is the number of permutations of the integers {1, . . . , k} that contain an increasing subsequence of length at least r. Let Xy denote a positive integer valued random variable such that Prob (Xy ≥ r) = F (r; y). If R k,r denotes the complement of Rk,r, i.e. the number of permutations σ ∈ Sk all of whose increasing subsequences have length strictly less than r, then clearly R k,r = ...
متن کامل