Practical Somatic Variant Analysis in Cancer Genomics

Course Team

Instructor

Dr. Flavio Lombardo ORCiD — University Hospital Basel / University of Basel

Contributors to original materials

Geert van Geest ORCiD

Attribution

This course builds upon materials originally developed for SIB Swiss Institute of Bioinformatics by Flavio Lombardo and Geert van Geest, with additional inspiration from the Precision Medicine Bioinformatics course by the Griffith Lab.

License & copyright

License: CC BY 4.0

Copyright: Flavio Lombardo. Originally developed for SIB Swiss Institute of Bioinformatics.

Overview

Cancer is a disease of the genome. Mutations of genes that regulate cell proliferation and cell death result in uncontrolled growth eventually causing symptoms. During cancer progression, mutations build up that not only affect cell growth, but also can suppress the immune system, increase the chance of metastases and promote genome instability leading to additional malignant mutations.

Characterizing the mutations of malignant tissue has been instrumental for the development of the diagnosis, prognosis and treatment of cancer in the last decades. Cancer is a highly heterogeneous disease, and by knowing the type of mutations, we have a better understanding of the nature of tumors, and can apply precision medicine approaches, like targeted drug and immune therapy.

Cancer variants are somatic, which means that they exist in only a part of the cells in the tissue. Even in a sample of a solid tumor, only a part of the cells contains the driver mutations. This makes analysis of cancer variants more challenging than inherited variants, where we assume (almost) all cells have the same genome.

In this course, you will learn the concepts of calling somatic variants from next generation sequencing data, and the basics of performing cancer variant annotation. The practical work will be mainly based on the GATK4 (Mutect2) pipeline and Ensembl’s Variant Effect Predictor (VEP).

Audience

This course is designed for students and researchers interested in cancer genomics. Participants should have basic familiarity with the command line and some programming experience (preferably R). Basic understanding of genomics and DNA sequencing is recommended but not required.

Learning outcomes

At the end of the course, the participants should be able to:

Perform quality control on sequencing data and interpret QC metrics
Align reads to a reference genome and assess alignment quality
Distinguish between germline and somatic variants from a biological and computational perspective
Understand the critical role of matched tumor-normal pairs in cancer genomics
Implement a complete somatic variant calling pipeline using GATK4 Mutect2
Work with VCF files in R for variant analysis
Annotate variants using VEP and interpret the results in a cancer biology context
Analyze Variant Allele Frequency (VAF) distributions
Filter and prioritize potentially pathogenic mutations based on biological impact
Create visualizations of variant characteristics and findings