skip to content

Core Bioinformatics group


The semi-supervised cell classification project is based on a sequencing input. The task is the classification of the cells according to external labels (in this context the labels are given by two fluorophores indicating the cell localization within the tissue). The two marker genes (eGFP and td Tomato) have highly variable expression levels and the quantification is not exclusive, thus hindering the straightforward label assignment. We select the top most representative cells for the two classes and then, using a Random Forest Classifier we obtain the most discriminative genes. The next step is assigning the class to the other unlabeled cells using the selected set of genes.

The truth is rarely pure and never simple. Oscar Wilde, The Importance of Being Earnest