Welcome class of 2015


Basic information

  • Lecturer: Dr. Yaniv Erlich [yaniv at cs döt columbia döt edu]
  • TA: Akshaan Kakar [ak3808 at columbia döt edu]
  • Hackathon master: Dr. Sophie Zaaijer [szaaijer82 at gmail döt com]
  • Time: Friday 12:10-2:00
  • Location: 608 Schermerhorn
  • Office hours: Akshaan - Tuesday & Thursday (3-4). Yaniv - Friday 2:15-4:00.
  • Credit: 3 points

Course Overview

DNA is one of the most ubiquitous forms of information in nature. In the last 20 years, DNA sequencing technologies have evolved at a breathtaking pace, much faster than Moore’s law, revolutionizing multiple domains ranging from personalized medicine to forensics. This course will cover the interface between computer science and the DNA sequencing revolution. Specifically, it will focus on the newest phase of the DNA sequencing revolution: the advent of low-cost mobile devices that can form the basis of ‘the Internet of living things’. Students will have hands-on experience with these devices (hackathons), including data gathering and analysis. The first hackathon is “Who is this person” where the students will try to figure out the identity of a person for his/her DNA sequencing data. The second hackathon is “What did I eat” and the students will have to identify an organism from by sequencing its DNA on a mobile sequencer.


The mobile DNA sequencers for this class are generously provided by Oxford Nanopore.

Download slides

2015 Syllabus

Week Date Topic Reading assignment Supplemental material
1 11-Sep Intro to DNA and DNA sequencing technologies "Big Data: Astronomical or Genomical?" [Stephens, PLoS Biology, 2015] "Sequencing technologies - the next generation" (Michael L. Metzker, Nature Reviews Genetics, 2010)
2 18-Sep Applications of DNA sequencing: Human genetics "Exome sequencing identifies the cause of a mendelian disorder" [Ng et al., Nature Genetics, 2010] "Diagnostic Applications of High-Throughput DNA Sequencing" [Boyd, Annual Review of Pathology: Mechanisms of Disease, 2013)
3 25-Sep Applications of DNA sequencing technologies: Metagenomics (1) "Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics" (Afshinnekoo, Cell Systems 2015) (2) Host lifestyle affects human microbiota on daily timescales [David et al., Genome Biology, 2014] "A Primer on Metagenomics" [Wooley et al., PLoS Comp Bio, 2008]
4 2-Oct Application of DNA sequencing: Forensics (1) "Genomic research and human subject privacy" [Lin, Science, 2004] (2) "Identifying Personal Genomes by Surname Inference" (Gymrek et al., Science, 2013) "Improving human forensics through advances in genetics, genomics and molecular biology" [Kayser et al., Nature Reviews Genetics, 2014]
5 9-Oct Mobile health " Enabling Large-scale Human Activity Inference on Smartphones using Community Similarity Networks" (Lane et al., Proceedings of Ubicomp, 2011)
6 16-Oct Mobile sequencing "A vision for Ubiquitous Sequencing" (Erlich, BioRxiv, 2015) | "Improved data analysis for the MinION nanopore sequencer" (Jain et al., Nature Methods, 2015) "The potential and challenges of nanopore sequencing" [Branton et al., 2008, Nature Biotechnology]
7 23-Oct Hackaton I: Who is this person? (1) "poRe: an R package for the visualization and analysis of nanopore sequencing data" [Watson et al., Bioinformatics, 2014] (2) "Poretools: a toolkit for analyzing nanopore sequence data" [Loman et al., Bioinformatics, 2014]
8 30-Oct Analysis pipelines(1) "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome" [Langmead et al., Genome Biology, 2009] "A survey of sequence alignment algorithms for next-generation sequencing" [Li et al., Briefings in Bioinformatics, 2010]
9 6-Nov Presentations for hackton I
10 13-Nov Hackaton II: What did I eat?
11 20-Nov Analysis pipelines(2) "Near-optimal RNA-Seq quantification" [Bary et al., arXiv, 2015]
12 27-Nov [No class] - -
13 4-Dec Presentations for hackton II
14 11-Dec DNA as storage devices or computers (1)" Toward practical high-capacity low-maintenance storage of digital information in synthesised DNA" [Goldman et al., Nature, 2013] (2) "Molecular Computation of Solutions to Combinatorial Problems" [Adleman, Science, 1994]
15 18-Dec - - -

Assignments and grading

Reading assignments

You are expected to read the paper and understand the main concepts and terms before the class.

Supplemental material

These are mainly review articles that discuss material that will be covered in the frontal class (and much more). It is mainly for reference and curious students.


The class has a few lessons that include team presentations. The length of each presentation is 10min and will be delivered by one member of the team. To encourage fairness and participation, the presenter will be randomly selected at the beginning of the presentation.

Coding/Written assignments

Teams are expected to code their own assignments. It is OK to brainstorm high level ideas with other teams. It is OK to consult online forums. However, the submitted code should be fully written by members of the team. No exceptions. To maximize impact all code should be submitted under the GNUv2 license.


  • Participation in class discussions: 25%
  • Hackathon1: 25% (10% presentation + 15% code submission)
  • Hackathon2: 25% (10% presentation + 15% code submission)
  • Final project: 25%

Text books

DNA sequencing and genomics are fast moving fields. The course therefore does not have a text book and will rely on research manuscripts and reviews. However, below I listed a few technical and non-technical books for interested students that would like to get a broader perspective on these fields:

  • A Short Guide to the Human Genome | Stewart Scherer (an excellent technical overview of different elements of the human genome, such as the length of each chromosome and what is the longest gene)
  • Introduction to Quantitative Genetics | Falconer (a technical overview that presents the statistical foundations of analyzing the genetic basis traits)
  • The $1,000 Genome | Kevin Davies (an excellent book non-technical book on the history of DNA sequencing)
  • The Creative Destruction of Medicine | Eric Topol (Chapter 5 provides a non-technical overview of sequencing and genomics and their impact on precision medicine)