Clustering and classification using Qlucore Omics Explorer

With Qlucore Omics explorer (QOE) 3.2 you are able to perform bothclustering and classification with just a few mouse clicks. Clustering and Classification are two fundamental techniques in statistical learning, both targeted at finding statistically significant structures and patterns in data sets, but methodologically quite different.

Clustering with Qlucore Omics Explorer

Given a data set, made up of measurements of variables for a collection of samples, clustering aims at finding subgroups of samples that are similar to each other. Clustering is often very useful in an exploratory phase of an analysis, e.g. in searching for unknown new clinical subgroups. 

In QOE, with a few mouse clicks you can perform hierarchical clusteringand kmeans (kmeans) with the new functionality in QOE 3.2. The outcome of performing kmeans clustering in QOE 3.2 is cataloged as a sample annotation and can immediately be examined in any of the multiple plot types available in QOE, e.g. heat maps or PCA plots. The kmeans clustering facility in QOE 3.2 also comes with easy-to-interpret clustering quality scores, so called silhouette values, capturing the clustering quality per individual sample. The accompanying silhouette plots, summarizing all the individual silhouette values, provide an overview of the overall clustering quality. Clustering, together with accompanying silhouette plots,can be very informative e.g. when comparing different attained clustering annotations with available clinical sample annotations.

Classification with Qlucore Omics Explorer

Classification is an example of a supervised technique, in this context meaning that the user should choose an annotation, the annotation defining the different classes, for a training data set. Based on the class annotation, the classifier is “trained”, using only the training data set, to find an optimal subset of variables that taken together have the highest probability of correctly classifying new samples (based on the performance on the training data set).

Classification can, for example, be useful in looking for new therapeutic targets and biomarkers, as it often delivers short lists of variables that collectively are able to correctly predict the class of new samples with some accuracy. The class annotation can then typically be, for example,response vs nonresponse or case vs control. The accuracy of the classifier (or predictor) is estimated in the training phase and is provided in the QOE classification report that is automatically generated upon running classification. 

In QOE 3.2 there are three different classification algorithms implemented;k nearest neighbors (kNN), support vector machine (SVM) and random forest classification. The three classification methods implemented in QOE 3.2 are methodologically quite different, but often give similar results.

A final note; kNN classification and kmeans clustering, when discussed together, can be somewhat confusing terms. They are different techniques having different purposes and they should not be confused with each other. In any case, good luck with your classification and clustering using Qlucore Omics Explorer!

Watch 1 min video and learn more how to use the clustering functionality.

For more information, have a look at Qlucore films, webinars, how to documents, white papers or application notes.