Boosting scRNA-seq data clustering by cluster-aware feature weighting
نویسندگان
چکیده
منابع مشابه
Clustering scRNA-Seq Data using TF-IDF
In this abstract, we propose several computational approaches for clustering scRNA-Seq data based on the Term Frequency Inverse Document Frequency (TF-IDF) transformation that has been successfully used in the field of text analysis. Empirical evaluation on simulated cell mixtures with different levels of complexity suggests that the TF-IDF methods consistently outperform existing scRNA-Seq clu...
متن کاملdropClust: efficient clustering of ultra-large scRNA-seq data.
Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a numb...
متن کاملComputational approaches for interpreting scRNA‐seq data
The recent developments in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high-dimensional data mining techniques. Here, we consider biological questions for which scRNA-seq data is used, both at a cell and gene level, and...
متن کاملCentral Clustering of Categorical Data with Automated Feature Weighting
The ability to cluster high-dimensional categorical data is essential for many machine learning applications such as bioinfomatics. Currently, central clustering of categorical data is a difficult problem due to the lack of a geometrically interpretable definition of a cluster center. In this paper, we propose a novel kernel-density-based definition using a Bayes-type probability estimator. The...
متن کاملBoosting by weighting critical and erroneous samples
Real Adaboost is a well-known and good performance boosting method used to build machine ensembles for classification. Considering that its emphasis function can be decomposed in two factors that pay separated attention to sample errors and to their proximity to the classification border, a generalized emphasis function that combines both components by means of a selectable parameter, l, is pre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2021
ISSN: 1471-2105
DOI: 10.1186/s12859-021-04033-7