About this course


In this course, we begin with approaches to visualization of genome-scale data, and provide tools to build interactive graphical interfaces to speed discovery and interpretation. 

We study out-of-memory approaches to the analysis of very large data resources, using relational databases or HDF5 as "back ends" with familiar R interfaces. Multiomic data integration is illustrated using a curated version of The Cancer Genome Atlas. 

Finally, we explore cloud-resident resources developed for the Encyclopedia of DNA Elements (the ENCODE project). These address transcription factor binding, ATAC-seq, and RNA-seq with CRISPR interference.

These courses make up two Professional Certificates and are self-paced:




Learning Formats: Videos
Institutions: Harvard University

About this course


We will explain how to perform the standard processing and normalization steps, starting with raw data, to get to the point where one can investigate relevant biological questions. 

Throughout the case studies, we will make use of exploratory plots to get a general overview of the shape of the data and the result of the experiment. 

We will learn the basic steps in analyzing DNA methylation data, including reading the raw data, normalization, and finding regions of differential methylation across multiple samples. The course will end with a brief description of the basic steps for analyzing ChIP-seq datasets, from read alignment, to peak calling, and assessing differential binding patterns across multiple samples.

These courses make up two Professional Certificates and are self-paced:



Learning Formats: Videos
Institutions: Harvard University

About this course


We begin with an introduction to the relevant biology, explaining what we measure and why. Then we focus on the two main measurement technologies: next generation sequencing and microarrays. 

We then move on to describing how raw data and experimental information are imported into R and how we use Bioconductor classes to organize these data, whether generated locally, or harvested from public repositories or institutional archives. 

Genomic features are generally identified using intervals in genomic coordinates, and highly efficient algorithms for computing with genomic intervals will be examined in detail. Statistical methods for testing gene-centric or pathway-centric hypotheses with genome-scale data are found in packages such as limma, some of these techniques will be illustrated in lectures and labs.

These courses make up two Professional Certificates and are self-paced:



Learning Formats: Videos
Institutions: Harvard University