Exact Distribution of a Spaced Seed Statistic for Applications in DNA Repeat Detection
نویسندگان
چکیده
Let a seed, S, be a string from the alphabet {1, ∗} which starts and ends with a 1. For example S = 11 ∗ 1. S occurs in a binary string B at position k if S can be positioned so that the last letter in S aligns with the kth letter in B, and each 1 in S aligns with a 1 in B. A 1 in B is covered by S if there exists some occurrence of S in B such that the 1 in B aligns with a 1 in the occurrence of S. We show how to compute the exact probability distribution for the number of 1s covered by a seed S in an i.i.d Bernoulli string of length n with probability of 1 equal to p. We refer to the new probability distribution as CnSp, for covered, with S being the seed. When S consists entirely of 1s, for example S = 111, this reduces to the familiar Rnkp which is the probability distribution for the number of 1s which occur in runs of length k or longer (k is the number of 1s in S). Importantly, our method is probability independent in that the calculation yields a formula in terms of the probability parameter p, and does not require fixing the value of p in advance. The CnSp distribution has applications in the detection of approximate DNA repeats using spaced seeds.
منابع مشابه
Exact Distribution of a Spaced Seed Statistic for DNA Homology Detection
Let a seed, S, be a string from the alphabet {1, ∗}, of arbitrary length k, which starts and ends with a 1. For example, S = 11 ∗ 1. S occurs in a binary string T at position h if the length k substring of T ending at position h contains a 1 in every position where there is a 1 in S. We say that the 1s at the corresponding positions in T are covered. We are interested in calculating the probabi...
متن کاملAccurate Inference for the Mean of the Poisson-Exponential Distribution
Although the random sum distribution has been well-studied in probability theory, inference for the mean of such distribution is very limited in the literature. In this paper, two approaches are proposed to obtain inference for the mean of the Poisson-Exponential distribution. Both proposed approaches require the log-likelihood function of the Poisson-Exponential distribution, but the exact for...
متن کاملTrack detection on the cells exposed to high LET heavy-ions by CR-39 plastic and terminal deoxynucleotidyl transferase (TdT)
Background: The fatal effect of ionizing radiation on cells depends on Linear Energy Transfer (LET) level. The distribution of ionizing radiation is sparse and homogeneous for low LET radiations such as X or γ, but it is dense and concentrated for high LET radiation such as heavy-ions radiation. Material and Methods: Chinese hamster ovary cells (CHO-K1) were exposed to 4 Gy Fe-ion 2000 keV/...
متن کاملAn Analysis of the Repeated Financial Earthquakes
Since the seismic behavior of the earth’s energy (which follows from the power law distribution) can be similarly seen in the energy realized by the stock markets, in this paper we consider a statistical study for comparing the financial crises and the earthquakes. For this end, the TP statistic, proposed by Pisarenko and et al. (2004), is employed for estimating the critical point or the lower...
متن کاملThe Lomax-Exponential Distribution, Some Properties and Applications
Abstract: The exponential distribution is a popular model in applications to real data. We propose a new extension of this distribution, called the Lomax-exponential distribution, which presents greater flexibility to the model. Also there is a simple relation between the Lomax-exponential distribution and the Lomax distribution. Results for moment, limit behavior, hazard function, Shannon entr...
متن کامل