Hi! I am a software engineer and a bioinformatician. I work at Color, where I lead the genomics software engineering team and help build the future of genetic testing, population genomics, and healthcare delivery. Color has rapidly scaled to address the healthcare needs related to the COVID-19 pandemic, and I am lucky to be part of this effort.
Prior to Color, I worked at the Chan Zuckerberg Initiative, where I spent most of my time building infrastructure for the CZID project (also known as IDseq); a lot of this work is available as open source in the SWIPE and czid-workflows repositories.
Prior to CZI, I was involved with several biotechnology startups which gave me the opportunity to become an expert in the design and implementation of genome analysis systems and cloud platforms. I am particularly known for building state of the art systems for DNA sequence analysis, machine learning applications in genomics, scalable distributed systems for genomics, API design, Linux systems, information security, and development operations. Specific applications that I was involved with include single-molecule DNA sequencing R&D, microbial genome and metagenome annotation and interpretation, single-cell sequencing, variant calling/interpretation, and phylogenomics. Since 2011, I have been using Amazon Web Services to build these solutions, and I am now an expert in many AWS technologies as well as Terraform, an essential IaC tool used to configure and manage assets on AWS. Some of the tools my teams use every day to manage AWS assets are available as open source in the Aegea project, as well as other projects listed on my GitHub profile.
Before my industry career, I graduated from UC Berkeley with a triple major in computer science, mathematics, and statistics. I then went to grad school at Georgia Tech and graduated with a PhD in bioinformatics. My undergraduate research work was at Lawrence Berkeley Lab with Inna Dubchak and Mike Brudno. Together with Mike’s advisor Serafim Batzoglou, we created the first global multiple genome alignment of multiple large eukaryotic genomes. My graduate advisor was Joshua Weitz, and my thesis title was Algorithm development for next generation sequencing-based metagenome analysis. You can read more exciting details in my CV. But a lot of what I learned during my time in grad school, I actually learned at Black Knight Martial Arts.
A few notable state of the art infrastructure projects that I implemented include an LXC-based cloud PaaS virtualization service which continues to power millions of jobs and Docker apps on the DNAnexus Titan platform; an independent implementation of SAML, OAuth2/OIDC, and their underlying technologies for single sign-on applications; an AWS IAM-based symbolic RBAC PDP service; and multiple API designs, CLI tools, and developer productivity tools for a variety of products. On the science side, I have mostly been involved in integration and tuning of existing genomics and machine learning software written by people smarter than me, but you can see some of my concrete contributions on my GitHub and Google Scholar profiles.
The technologies I use in my daily work include Terraform, AWS Fargate, S3, EFS, Lambda, Step Functions, IAM, RDS/Aurora, Python, Rust, OIDC, LXC, Docker, cloud-init, as well as many genomics tools including Kraken2, miniwdl, bedtools2, and minimap2.
I live in San Francisco. In my spare time I do a lot of the usual suspect activities such as biking, running, photography, and selecting the proper power tools for home improvement.