VIDEO and FREE TRIAL
Qlucore Omics Explorer video
Qlucore Omics Explorer free trial
As data sets grow larger and more complex, machine learning methods are becoming more pervasive in the biomedical community. Many researchers are not trained in building and interpreting such models, and it can be difficult to choose the appropriate approach in a given context. In its new version, Qlucore Omics Explorer expands the collection of machine learning functionality, and now also includes k-means clustering and classification.
When choosing between machine learning methods it is important to distinguish between supervised and unsupervised methods, which are used in different contexts and for different purposes. Unsupervised methods do not use any external information (annotations, such as disease status or other traits) about the objects to be analyzed, but rather try to find dominating structure or patterns in the data, patterns that can then be interpreted by the researcher. Supervised methods, on the other hand, typically aim at building models that predict or 'explain' some pre-specified annotation, e.g. disease status or the response to a treatment. This annotation may or may not correspond to the main pattern(s) in the data. Classification, or predictive modeling, is an example of supervised learning. You can read more about the new classification functionality in Qlucore Omics Explorer 3.2.
If the goal is to get an overview of a data set, to see which the strongest patterns are and whether the samples naturally partition into subgroups, an unsupervised method like clustering or PCA should be used. Here, we describe unsupervised clustering and discuss how and when it can be used.
Qlucore Omics Explorer offers two types of clustering methods: hierarchical clustering (combined with heatmaps) and k-means clustering. Both are used for the same purpose: to find subgroups among the samples, such that samples within one group are more “similar” to each other than samples belonging to different groups, where “similar” can be formally defined in various ways. The difference is that the hierarchical clustering builds a “cluster tree” (or dendrogram), which organizes the samples hierarchically but does not directly divide them into clusters, while the k-means clustering partitions the samples into a pre-defined number of groups.
Practical situations where you would like to use a clustering approach could be to:
Large and complex data sets often contain a lot of noise, in the sense that weaker signals interfere with the stronger ones and hence can impact the performance of clustering algorithms. Qlucore Omics Explorer includes two tools that can reduce the impact of noise; variance filtering and projection score. Moreover, the silhouette plot type option is included to help with evaluating the quality of a given sample partitioning. In Qlucore Omics Explorer, silhouette values are calculated for each generated k-means clustering.
A possible exploratory workflow, combining the noise reduction and clustering functionality of Qlucore Omics Explorer to find subgroups among the samples, is to:
Learn more or download a free trial and try on your own data.