skip to content

Core Bioinformatics group

 

All teaching materials from the Core Bioinformatics group can be found on our GitHub page 

CSCI Teaching

Introduction to RNAseq

This repository contains an overview of bulk RNA-Seq data analysis delivered to members of the CSCI in November 2021

  • Overview of pipeline components
  • QC, alignment to the reference genome and feature quantification
  • Noise quantification and removal
  • Post-alignment QC, differential expression and enrichment analysis

BBS

This repository contains slides of a course on various bioinformatics topics delivered in February to March 2021

Linear Regression

The linear regression lecture slides can be found here.

The practical materials can be found here.

The supervision materials can be found here.

RNAseq

The RNAseq lecture slides can be found here and here.

The practical materials can be found here for mRNAseq and here for sRNAseq.

The supervision materials can be found here.

Machine Learning

Supervised Learning

The supervised Machine Learning lecture slides can be found here.

The practical materials for supervised Machine Learning can be found here.

Unsupervised Learning

The unsupervised Machine Learning lecture slides can be found here.

The practical materials for unsupervised Machine Learning can be found here.

Introduction to Machine Learning

This repository contains an introductory practical machine learning delivered to Astra Zeneca in November 2021

  • Introduction, overview of techniques and cross-validation
  • CARET package and k nearest neighbours
  • Decision trees, random forests, and support vector machines
  • Practical on supervised approaches
  • Dimensionality reduction and clustering

Cuomo et al 2020

This example is based on single-cell RNA-sequencing of differentiating iPS cells and the data comes from the Cuomo et al 2020 paper. In this example, we will focus on day 3 and only three donors to reduce the time requirements. Using 500 randomly selected genes, we want to accurately classify between three pre-defined cell types. The data corresponding to these genes, timepoints and donors can be found here.

You should first perform some visualisation to understand the data then preprocess the data ready for classification. Apply and test some classical classifiers (decision tree, random forest, SVM) and optimise the respective hyperparameters. You could also identify the genes which best discriminate between cell types and retrain classifiers using a smaller set of genes. Finally, compare between the classifiers and choose the best one. A skeleton Rmd broken down into these steps can be found here.

Some example analysis can be found in this folder.

The truth is rarely pure and never simple. Oscar Wilde, The Importance of Being Earnest