Drop CC

Cytometry Clustering

Mass cytometry

State of the art technology in multi-parametric single-cell analysis

In the era of “Big Data”, Biology is contributing with an exponentially growing mole of raw data that bio-informaticians struggle to process and comprehend. This trend can also be seen in specialized areas such as Single Cell Proteomics.

Mass cytometry is the state-of-the-art technology in multi-parametric single-cell analysis. Cytometry instruments such as CyTOF combine traditional fluorescence-based flow cytometry with the selectivity and quantitative power of inductively coupled plasma-mass spectrometry (Atkuri, Stevens et al. 2015). CyTOF uses antibodies tagged to stable isotopes of rare earth metals, allowing to acquire up to 100 independent measurements on single cells.

This kind of analysis is daily performed for the first stages of clinical trials, Pharma R&D departments and laboratories all around the world.

CyTOF

State of the art

Nowadays the most used unsupervised algorithm to process CyTOF data is S.P.A.D.E. (2011, Qiu et al.) which was developed in strict collaboration with the CyTOF team in order to launch the instrument with an at least workable algorithm to interpret the data.

All the research and testing was made on top of bone marrow and blood data where the markers and the differences between the populations were easily distinguishable but nowadays the real-world applications are different and the target populations are often hidden by noise as other big populations with similar markers hence the actual solution has a few drawbacks:

‣ The tendency of overclustering bigger populations;

‣ The tendency of losing rare populations which are usually agglomerated with the bigger ones;

‣ Requirement of the human technical intervention in order to manually preprocess the data;

‣ The algorithm requires many parameters to set in order to launch the computation (downsampling, number of clusters).

Drop CC

A new algorithm to outperform S.P.A.D.E. in terms of speed, accuracy and “user friendliness”

New bioinformatics approaches to analyze and classify high-dimensional mass cytometry data

How it works

The CC algorithm uses a first step of pre-clustering, where K-means is adapted to work with different cluster sizes, and then a mean-shift step is applied on a dataset reshaped to take account of local conditional probability.

The resulting populations are then plotted using the multidimensional scaling of the cluster centers.

What Drop CC can do

Easily distinguish populations which are thousand-fold smaller than the big ones, in noisy conditions where algorithms such as S.P.A.D.E. fails;

Plot an accurate count of the numerosity of each population;

Ease the interpretation of the data;

Handle always bigger datasets in terms of both numerosity and dimensionality;

Run efficiently without human intervention for the preprocessing phase and without setting any parameters

Summary

The Drop CC solution permits to allow technicians to categorize and visualize cytometry data with a higher accuracy with no parameter settings required, no error prone preprocessing and an overall increase in the workflow speed thanks to the user-friendliness of the solution.

Drop CC vs SPADE

A new algorithm to outperform S.P.A.D.E. in terms of speed, accuracy and “user friendliness”

New bioinformatics approaches to analyze and classify high-dimensional mass cytometry data

We have developed a new algorithm, based on “mean Shift”, that rapidly and efficiently identifies cell populations that similar molecular profiles

The CC pipeline

  • Preclustering Stage we use a parallelized form of KMeans to reduce the number of points that need to be clusterized
  • Correlation Scanning by calculating “local correlation” we create a new dimension in the ndimensional space to help “Mean Shift” in separating special clusters where the local maximum of the kernel is hard to find
  • Clustering A standard Mean Shift run
  • Visualization We compute the multi dimensional scaling using the “Scaling by Majorizing a Complicated Function” (SMACOF) to find a 2D representation of the ndimensional Cluster Centers

Limits of the SPADE clustering algorithm

Limits of the KMeans clustering algorithm

Limits of the “Mean Shift” clustering algorithm

Differently from “mean shift”, the algorithm that we have developed detects small populations even if they are located near more abundant populations

Drop CC Application

The application is developed to be accessible from desktop computers as well as from Tablet.

The application is developed to be accessible from desktop computers as well as from Tablet. In both cases the access to the services is restricted to registered users only.

Once the user has correctly completed the login operation he will be transferred to the main application dashboard, from which most of the operations can take place. The two main actions he can take: “Generate CSV from file” or “Open CSV”

By selecting “Generate CSV from File” a dialog window will pop up giving the user the possibility to choose a FCS file from his local hard drive to be sent to the server

Once the file has been correctly uploaded and interpreted, the server will respond with a list of possible markers available in the FCS file, from which the user can select the ones on top of witch the CC algorithm has to be run.

By opening the dropdown area Enable cleaners markers, the user will also be able to perform an automatic cleaning of the debris and dead cells. Once the CSV has been correctly generated it can be selected, from the user’s list, in order to show the proper dashboard.

By clicking on one or more markers the specific marker chart will be displayed on the right side and the cell populations will be shown in the bubble chart on the left where the user can appreciate the relative cluster sizes and similarities between the populations.

The marker charts will also offer to the user the possibility to apply gates on specific markers which will dynamically and in real time influence the left hand bubble chart.

The user can select one or more specific cell populations from the bubble chart in order to filter the results in the marker’s charts on the right. By selecting the groups the marker’s charts will change dynamically, in real time, displaying the statistics of the specific groups selected.