Resources
Statistical/ML methods for genetics/genomics
Gene regulatory network inference
- LLCB : Our algorithm for inferring gene-regulatory networks from arrayed CRISPR perturbations with a bulk RNA-seq read-out.
GWAS + PRS
- gwasplot : This is our in house R package for working with large-scale GWAS derived from WGS with many rare variants. It is very performant because it uses duckdb underneath.
- UKB_regenie_workflow : This our workflow for running REGENIE in the UK Biobank DNA Nexus RAP. It is better than the ‘applet’ built into the platform.
- PRSFNN: Our neural empirical Bayesian PRS method.
- PRSFNN SNP Annotations: Our pipeline for generating SNP annotations for PRSFNN.
Somatic variation analysis
- GEM : The genomic and epigenomic mutation rate estimator.
- pileup_region : This was used to call U2AF1 mutatios in TOPMed.
- PACER . This is the ‘official’ implementation of the variant calling procedure to quantify passenger mutations, which are the key ingredient of the Passenger-Approximated Clonal Expansion Rate (PACER).
- somatic.emory.edu : Our interactive portal for viewing associations between CH point mutations and disease in UKB at a single-variant level
- CH calling pipeline : Our bcftools approach to calling CH point mutations
Our methods development philosophy
We develop primarily in R/Python/Julia/Rust, with things trending more towards Julia and Rust. We aspire to write code that facilitates reproducible science. Practically, this means unit tests, documentation, Docker/Singularity images, and continuous integration (all of these facilitate reproducibility). We are increasingly adopting these principles into all of our code bases that are in progress.