Eukaryotic promoter recognition.
نویسندگان
چکیده
On ly recen tly h as it becom e common to determ in e eukaryotic gen om ic sequen ces large en ough to con tain several gen es. With th ese data com es a n ew problem for gen e fin din g program s: to partit ion a set of exon s correctly am on g several gen es. On e lin e of developm en t in eukaryot ic gen e iden tification begin s with codin g region iden tificat ion by statist ical m ean s an d adds pattern recogn it ion for sites of tran scrip tion al, sp licin g, an d tran slat ion al con trol to produce algorithm s capable of suggest in g overall gen e structu re (for review, see Gelfan d 1995; Fickett 1996a). To date, m ost developm en t effort h as focused on in tegrat ion of th e various kin ds of pattern in form ation in th e relat ively sim ple case wh ere a sin gle complete gen e is presen t in th e in put sequen ce. In th is case, curren t algorithm s usually suggest a putative protein tran slat ion sim ilar to th at in th e literature, th ough th ere is st ill sign ifican t room for im provem en t (Burset an d Guigo 1996). Th e exten sion of th ese algorithm s to deal with a sequen ce con tain in g multip le or part ial gen es is just begin n in g (Burge an d Karlin 1997; h t t p :/ / gn om ic.st an fo rd .ed u / ∼ch ris/ GENSCANW.h tm l). Because th e sign als th at con trol th e start an d stop of tran scrip tion an d tran slation , an d th e location of splicin g, are still n ot very well un derstood, it is n ot un common for a gen e-fin din g algorithm to con fuse in tern al with in it ial an d term in al exon s, th us wron gly part it ion in g th e exon s. Th e problem is compoun ded by our in complete un derstan din g of altern ative splicin g con trol elem en ts. An oth er lin e of developm en t in gen e iden tificat ion is based on h omology (e.g., Gish an d States 1993; Gelfan d et al. 1996). If th ere is a close h omolog in th e databases to on e of th e gen es in th e sequen ce un der an alysis, sequen ce sim ilarity will usually group th e exon s for th is gen e correctly. Still, in m an y cases th ere is n o close h omolog an d n o guaran tee wh en th ere is som e h om olog th at th e en coded protein lacks in sert ion s/deletion s. Clearly, som e m ean s of recogn izin g th e begin n in gs of gen es, probably via th e prom oter, or th e en ds, probably by m ean s of th e polyaden ylation sign al or tran slation term in ation sign al (e.g., Kon drakh in et al. 1994; Wah le an d Keller 1996; Dalph in et al. 1997; Solovyev an d Salam ov 1997), would en able a m ajor advan ce. Th e prom oter seem s to be a m uch rich er sign al th an th e 38 processin g sign als, th ough , as we sh all see below, it is n ot easy to take advan tage of th e in form ation in th e prom oter.
منابع مشابه
ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural-functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases th...
متن کاملStructural dynamics and DNA interaction of human TFIID
TFIID is a large protein complex required for the recognition and binding of eukaryotic gene core promoter sequences and for the recruitment of the rest of the general transcription factors involved in initiation of eukaryotic protein gene transcription. Cryo-electron microscopy studies have demonstrated the conformational complexity of human TFIID, where one-third of the mass of the complex ca...
متن کاملStochastic segment models of eukaryotic promoter regions.
We present a new statistical approach for eukaryotic polymerase II promoter recognition. We apply stochastic segment models in which each state represents a functional part of the promoter. The segments are trained in an unsupervised way. We compare segment models with three and five states with our previous system which modeled the promoters as a whole, i.e. as a single state. Results on the c...
متن کاملInterpolated markov chains for eukaryotic promoter recognition
MOTIVATION We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in mic...
متن کاملIdentification of core promoter modules in Drosophila and their application in accurate transcription start site prediction
The reliable recognition of eukaryotic RNA polymerase II core promoters, and the associated transcription start sites (TSSs) of genes, has been an ongoing challenge for computational biology. High throughput experimental methods such as tiling arrays or 5' SAGE/EST sequencing have recently lead to much larger datasets of core promoters, and to the assessment that the well-known core promoter se...
متن کاملCloning and secretory expression of VP2 gene of infectious bursal disease virus in eukaryotic cells
VP2 gene coding region of a vaccinal strain (D78) of infectious bursal disease virus (IBDV) was clonedin a eukaryotic expression vector, pSec Tag2A. The gene was placed downstream of Ig κ chain leadersequence, under the control of human cytomegalovirus (hCMV) immediate early enhancer and promoter. Theconstruct pSec Tag2A-VP2 was transfected in COS-7 cell line and the expression and secretion of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 7 9 شماره
صفحات -
تاریخ انتشار 1997