Hi! I am a software engineer and a bioinformatician. I work at the Chan Zuckerberg Initiative, where I currently spend most of my time building infrastructure for the IDseq project. My job is to enable the IDseq team to ensure our product analyzes pathogen DNA reliably, correctly, and quickly at all times, as well as to collaborate with the scientific and engineering community at large - for example, by sharing our production analysis methods as code in the idseq-workflows repo. Recently, IDseq has been used by Biohub scientists and others to analyze SARS-CoV-2 (COVID-19) viral DNA and associated metagenomes.
Prior to CZI, I was involved with a number of biotech startups which gave me the opportunity to become an expert in the design and implementation of genome analysis systems and cloud platforms. I am particularly known for building state of the art systems for DNA sequence analysis, machine learning applications in genomics, scalable distributed systems for genomics, API design, Linux systems, information security, and development operations. Specific applications that I was involved with include single-molecule DNA sequencing R&D, microbial genome and metagenome annotation and interpretation, single-cell sequencing, variant calling/interpretation, and phylogenomics. Since 2011, I have been using Amazon Web Services to build these solutions, and I am now an expert in many AWS technologies. I also admire the AWS leadership principle of customer obsession and use it to guide decisions in teams and products that I am involved with.
Before my industry career, I graduated from UC Berkeley with a triple major in computer science, mathematics, and statistics. I then went to grad school at Georgia Tech and graduated with a PhD in bioinformatics. My undergraduate research work was at Lawrence Berkeley Lab with Inna Dubchak and Mike Brudno. Together with Mike’s advisor Serafim Batzoglou, we created the first global multiple genome alignment of multiple large eukaryotic genomes. My graduate advisor was Joshua Weitz, and my thesis title was Algorithm development for next generation sequencing-based metagenome analysis. You can read more exciting details in my CV. But a lot of what I learned during my time in grad school, I actually learned at Black Knight Martial Arts.
A few notable state of the art infrastructure projects that I implemented include an LXC-based cloud PaaS virtualization service which continues to power millions of jobs and Docker apps on the DNAnexus Titan platform; an independent implementation of SAML, OAuth2/OIDC, and their underlying technologies for single sign-on applications; an AWS IAM-based symbolic RBAC PDP service; and multiple API designs, CLI tools, and developer productivity tools for a variety of products. On the science side, I have mostly been involved in integration and tuning of existing genomics and machine learning software written by people smarter than me, but you can see some of my concrete contributions in my Google Scholar profile.
The technologies I use in my daily work include Terraform, AWS Fargate, S3, EFS, Lambda, Step Functions, IAM, RDS/Aurora, Python, Rust, OIDC, LXC, Docker, cloud-init, as well as many genomics tools including Kraken2, miniwdl, bedtools2, and minimap2.
I live in San Francisco. In my spare time I do a lot of the usual suspect activities such as biking, running, photography, and selecting the proper power tools for home improvement.