Algorithmic Approaches to Clustering Gene Expression Data

نویسندگان

  • Ron Shamir
  • Roded Sharan
چکیده

Technologies for generating high-density arrays of cDNAs and oligonucleotides are developing rapidly, and changing the landscape of biological and biomedical research. They enable, for the rst time, a global, simultaneous view on the transcription levels of many thousands of genes, when the cell undergoes speci c conditions or processes. For several organisms that had their genomes completely sequenced, the full set of genes can already be monitored this way today. The potential of such technologies is tremendous: The information obtained by monitoring gene expression levels in di erent developmental stages, tissues types, clinical conditions and di erent organisms can help understanding genes function, gene networks, diagnostic of disease conditions and e ects of medical treatments. Undoubtedly, other applications will emerge in coming years. A key step in the analysis of gene expression data is the identi cation of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering gene expression data. A clustering problem consists of elements and a characteristic vector for each element. A measure of similarity is de ned between pairs of such vectors. (In gene expression, elements are usually genes, the vector of each gene contains its expression levels under each of the monitored conditions, and similarity can be measured, for example, by the correlation coeÆcient between vectors.) The goal is to partition the elements into subsets, which are called clusters, so that two criteria are satis ed: Homogeneity elements in the same cluster are highly similar to each other; and separation elements from di erent clusters have low similarity to each other. Department of Computer Science, Tel-Aviv University, Tel-Aviv, Israel. fshamir,[email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

O-3: Drug Repositioning by Merging Gene Expression Data Analysis and Cheminformatics Target Prediction Approaches

The transcriptional responses of drug treatments combined with a protein target prediction algorithm was utilised to associate compounds to biological genomic space. This enabled us to predict efficacy of compounds in cMap and LINCS against 181 databases of diseases extracted from GEO. 18/30 of top drugs predicted for leukemia (e.g. Leflunomide and Etoposide) and breast cancer (e.g. Tamoxifen a...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Experimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering

One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...

متن کامل

Bayesian unsupervised learning with multiple data types.

We propose Bayesian generative models for unsupervised learning with two types of data and an assumed dependency of one type of data on the other. We consider two algorithmic approaches, based on a correspondence model, where latent variables are shared across datasets. These models indicate the appropriate number of clusters in addition to indicating relevant features in both types of data. We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000