2017 TOPMed DCC Analysis Workshop

August 7 – 9, 2017, Seattle, WA

Session 0: Introduction and Logistics

Session 1. Introduction to TOPMed data

This session reviews what data is available, how to access it, and some of the QC steps that DCC implements for TOPMed data:

  • Introduction to TOPMed and Data Sharing [.pptx]
  • QC steps for Freeze 4 data, including variant, sample, and pedigree-based checks [.pdf]
  • Introduction to the Genomic Data Storage (GDS) format and tools for creating and using files in this format [.pdf]
  • Worked example
  • Exercises

Session 2a. Population structure and relatedness

This session describes how TOPMed data can be used to illustrate and estimate measures of relatedness, either at the population level or between specific individuals

  • Review of what population structure is, and how related quantities can be estimated using TOPMed data [.pdf]
  • Estimating relatedness for participants in TOPMed [.pdf]
  • R packages used in both tasks [.pdf]
  • Worked examples
  • Exercises

Session 2b. Phenotypes

This session describes how to access, examine and harmonize phenotypes from multiple TOPMed studies

  • Guidelines for how to harmonize phenotypes, within TOPMed [.pdf]
  • Accessing and using unharmonized TOPMed phenotypes, via dbGaP [.pdf]
  • Worked examples

Session 3. Association tests

This session reviews why association tests are a useful way to analyze TOPMed data, describes how single-variant association tests work – and why this can be challenging within TOPMed – before briefly summarizing some multiple-variant tests.

  • Slides describing the methods and their motivation, strengths, and weaknesses [.pdf]
  • R functions from the DCC pipeline that implement widely-used association tests [.pdf]
  • Worked examples
  • Exercises for single variant tests
  • Exercise for multiple variant tests

Session 4. Variant annotation

This session introduces variant annotation for TOPMed, including how to define and filter aggregation units using variant annotations

Session 5a. DCC pipeline

This session introduces the DCC and Analysis Commons pipeline, emphasizing how cluster and cloud computing can be used to implement high-throughput TOPMed analyses efficiently

  • Introduction [.pdf]
  • Multi-threaded versions of some analyses seen in earlier sessions [.pdf]
  • Running the same analyses on the cloud via Amazon Web Services [.pdf]
  • Wrap-up, discussing approximate costs and where to learn more [.pdf]
  • Worked examples

Session 5b. Analysis Commons

This session describes the Analysis Commons, a DNANexus-based system for cloud computing with TOPMed and similar data.


