Practical Somatic Variant Analysis in Cancer Genomics
Course Team
Instructor
- Dr. Flavio Lombardo ORCiD — University Hospital Basel / University of Basel
Contributors to original materials
- Geert van Geest ORCiD
Attribution
This course builds upon materials originally developed for SIB Swiss Institute of Bioinformatics by Flavio Lombardo and Geert van Geest, with additional inspiration from the Precision Medicine Bioinformatics course by the Griffith Lab.
License & copyright
License: CC BY 4.0
Copyright: Flavio Lombardo. Originally developed for SIB Swiss Institute of Bioinformatics.
Overview
Cancer is a disease of the genome. Mutations of genes that regulate cell proliferation and cell death result in uncontrolled growth eventually causing symptoms. During cancer progression, mutations build up that not only affect cell growth, but also can suppress the immune system, increase the chance of metastases and promote genome instability leading to additional malignant mutations.
Characterizing the mutations of malignant tissue has been instrumental for the development of the diagnosis, prognosis and treatment of cancer in the last decades. Cancer is a highly heterogeneous disease, and by knowing the type of mutations, we have a better understanding of the nature of tumors, and can apply precision medicine approaches, like targeted drug and immune therapy.
Cancer variants are somatic, which means that they exist in only a part of the cells in the tissue. Even in a sample of a solid tumor, only a part of the cells contains the driver mutations. This makes analysis of cancer variants more challenging than inherited variants, where we assume (almost) all cells have the same genome.
In this course, you will learn the concepts of calling somatic variants from next generation sequencing data, and the basics of performing cancer variant annotation. The practical work will be mainly based on the GATK4 (Mutect2) pipeline and Ensembl’s Variant Effect Predictor (VEP).
Audience
This course is designed for students and researchers interested in cancer genomics. Participants should have basic familiarity with the command line and some programming experience (preferably R). Basic understanding of genomics and DNA sequencing is recommended but not required.
Learning outcomes
At the end of the course, the participants should be able to:
- Perform quality control on sequencing data and interpret QC metrics
- Align reads to a reference genome and assess alignment quality
- Distinguish between germline and somatic variants from a biological and computational perspective
- Understand the critical role of matched tumor-normal pairs in cancer genomics
- Implement a complete somatic variant calling pipeline using GATK4 Mutect2
- Work with VCF files in R for variant analysis
- Annotate variants using VEP and interpret the results in a cancer biology context
- Analyze Variant Allele Frequency (VAF) distributions
- Filter and prioritize potentially pathogenic mutations based on biological impact
- Create visualizations of variant characteristics and findings