|
Sign In to gain access to subscriptions and/or personal tools.
|
Ensemble clustering method based on the resampling similarity measure for gene expression data
Seo Young Kim
Research Institute for Basic Science, Chonnam National University
Jae Won Lee
Department of Statistics, Korea University, Seoul, Korea, jael{at}korea.ac.kr
The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Microarray technology has great potential for creating an enormous amount of data in a short time, and now becomes a new tool for studying such broad problems as classification of tumors in biology and medical science. Many statistical methods are available for analysing and systematizing these complex data into meaningful information, and one of the main goals in analysing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we developed a new clustering method of class discovery in a dataset. The performances of the new and existing methods were compared using both simulated data and real gene expression data. The proposed method was generally found to give more accurate cluster numbers and cluster assignments for individual objects than the three well-known general clustering methods such as agglomerative and divisive hierarchical clustering (HC) and self-organizing map (SOM). It also gave better results than the three consensus clustering methods based on agglomerative and divisive HC and SOM.
References
- Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. The Chipping Forecast 1999; 21: 33—7.
- Eisen MB, Spellman PT, Brown PO, Botstein, D. Cluster analysis and display of genome wide expression patterns. Proceedings of the National Academy of Sciences 1998; 95: 14863—8.[Abstract/Free Full Text]
- Tamayo P., Slonim D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander ES, Golub TR Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences 1999; 96: 2907—12.[Abstract/Free Full Text]
- Tavazoie S., Hughes JD, Campbell MJ, Cho RJ, Church GM Systematic determination of genetic network architecture. Nature Genetics 1999; 22: 281—5.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Yeung KY, Fraley C., Murua A., Raftery AE, Ruzzo WL Model-based clustering and data transformations for gene expression data. Bioinformatics 2001; 17: 977—87.[Abstract/Free Full Text]
- McLachlan GJ, Basford KE Mixture models: inference and applications to clustering. Marcel Dekker, 1988.
- McLachlan GJ, Bean RW, Peel D. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 2002; 18: 413—22.[Abstract/Free Full Text]
- Dudoit S., Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 2002; 3: research0036.1—0036.21.[Medline]
[Order article via Infotrieve]
- Datta S., Datta S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 2003; 19: 459—66.[Abstract/Free Full Text]
- Dembele D., Kastner P. Fuzzy c-means method for clustering microarray data. Bioinformatics 2003; 19: 973—80.[Abstract/Free Full Text]
- Fred A. Finding consistent clusters in data partitions. In Kittler and Roli eds. Multiple classifier systems of Lecture notes in computer science. Springer, 2001: 309—18.
- Weingessel A., Dimitriadou E., Hornik K. An ensemble method for clustering. DSC 2003 Working papers 2003; http://www.ci.tuwien.ac.at/Conferences/DSC—2003.
- Dudoit S., Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 2003; 19: 1090—9.[Abstract/Free Full Text]
- Kerr MA, Churchill GA Statistical design and the analysis of gene expression microarray data. Genetical Research 2001; 77: 123—8.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- McLachlan GJ On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 1987; 36: 318—24.[CrossRef][Web of Science]
- Tibshirani R., Walther G., Hastie T. Estimatung the number of clusters in a dataset via the gap statistic. Technical Report, Depatment of Biostatistics, Stanford University, March 2000.
- Smolkin M., Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 2003; 4: 1471—2105. Retrieved from http://www.biomedcentral.com
- Monti S., Tamayo P., Mesirov J., Golub T. Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data. Kluwer Academic Publishers; 2003.
- Kaufman L., Rousseeuw PJ Finding groups in data: an introduction to cluster analysis. John Wiley, 1990.
- Kohonen T. Self-organizing maps. Series in Information Sciences, Volume 30. Springer, 1997.
- Hastie T., Tibshirani R., Fredman J. The elements of statistical learning: data mining, inference and prediction. Springer, 2001.
- Broberg P. Ranking genes with respect to differential expression. Genome Biology 2002; 3: preprint0007.1—0007.23. Available at http://genomebiology.com/2002/3/9/preprint/0007.
- Zhao Y., Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics 2003; 19: 1046—54.[Abstract/Free Full Text]
- Bittner M., Meltzer P., Chen Y., Jiang Y., Seftor E., Hendrix M., Radmacher M., Simon R., Yakhini Z., Ben-Dor A., Sampas N., Dougherty E., Wang E., Marincola F., Gooden C., Lueders J., Glatfelter A., Pollock P., Carpten J., Gillanders E., Leja D., Dietrich K., Beaudry C., Berens M., Alberts D., Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000; 406: 536—40.[CrossRef][Medline]
[Order article via Infotrieve]
- Dalene RG, Debashis G., Erin MC Statistical issues in the clustering of gene expression data. Statistic Sinica 2002; 12: 219—40.
- Golub TR, Slonim DK, Tamato P., Huard C., Gaasenbeek M., Mesirov JP, Coller H., Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531—7.[Abstract/Free Full Text]
- Bhattacharjee A., Richards WG, Staunton J., Li C., Monti S., Vasa P., Ladd C., Beheshti J., Bueno R., Gillette M., Loda M., Weber G., Mark EJ, Lander ES, Wong W., Johnson BE, Golub TG, Sugarbaker DJ, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinomas sub-classes. Proceedings of the National Academy of Sciences 2001; 98: 13790—5.[Abstract/Free Full Text]
- Ross DT, Scherf U., Eisen MB, Perou CM, Rees C., Spellman P., Iyer V., Jeffrey SS, Van de Rijn M., Waltham M., Pergamenschikov A., Lee JC, Lashkari D., Shalon D., Myers TG, Weinstein JN, Botstein D., Brown PO Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 2000; 24: 227—34.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
- Ramaswamy S., Tamayo P., Rifkin R., Mukherjee S., Yeang CH, Angelo M., Ladd C., Reich M., Latulippe E., Mesirov JP, Poggio T., Gerald W., Loda M., Lander ES, Golub TR Multi—class cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 2001; 98: 15149—54.[Abstract/Free Full Text]
- Kasturi J., Acharya R., Ramanathan M. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 2003; 19: 449—58.[Abstract/Free Full Text]
- Yeung KY, Ruzzo WL An empirical study on principal component analysis for clustering gene expression data. Technical Report 2000 UW-CSE-00-11-01, Department of Computer Science and Engineering, University of Washington.
- Fowlkes EB, Mallows CL A method for comparing two hierarchical clustering. Journal of American Statistical Association 1983; 78: 553—84.[CrossRef]
This version was published on December
1, 2007
Statistical Methods in Medical Research, Vol. 16, No. 6,
539-564 (2007)
DOI: 10.1177/0962280206071842

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
|
|