Research Scientist at UIUC 2015--2017

  • Primarily responsible for the planning, design, organization, execution, and analysis of multiple complex epidemiological studies involving epigenomics, transcriptomics, and genomics of diseases of pregnancy and post-traumatic stress disorder.
  • Published results in scientific publications and presented results orally at major scientific conferences.
  • Wrote and completed grants, including budgeting, scientific direction, project management, and reporting.
  • Mentored graduate students and collaborated with internal and external scientists.
  • Performed literature review, training, and applied new techniques to maintain abreast of current scientific literature, principles of scientific research, and modern statistical methodology.
  • Wrote software and designed relational databases using R, perl, C, SQL, make, and very large computational systems.

Postdoctoral Researcher at USC 2013--2015

  • Primarily responsible for the design, execution, and analysis of an epidemiological study to identify genomic variants associated with systemic lupus erythematosus using targeted deep sequencing.
  • Designed, budgeted, configured, maintained, and supported a secure linux analysis cluster (MPI/torque) with a shared filesystem (NFS over gluster) for statistical analyses.
  • Wrote multiple pieces of software to reproducibly analyze and archive large datasets resulting from genomic sequencing.
  • Coordinated with clinicians, molecular biologists, and biologists to produce analyses and major reports.

Postdoctoral Researcher at UCR 2010--2012

  • Primarily responsible for the execution and analysis of an epidemiological study to identify genomic variants associated with systemic lupus erythematosus using prior information and array based approaches in a trio and cross sectional study of individuals from the Los Angeles and greater United States.
  • Wrote and maintained multiple software components to reproducibly perform the analyses.

Debian Developer 2004--Present

  • Maintained, managed configurations, and resolved issues in multiple packages written in R, perl, python, scheme, C++, and C.
  • Resolved technical conflicts, developed technical standards, and provided leadership as the elected chair of the Technical Committee.
  • Developer of Debbugs, a perl and SQL-based issue-tracker with ≥ 100 million entries with web, REST, and SOAP interfaces.

Independent Systems Administrator 2004--Present

  • Researched, recommended, budgeted, designed, deployed, configured, operated, and monitored highly-available high-performance enterprise hardware and software for web applications, authentication, backup, email, and databases.
  • Provided vendor-level support for complex systems integration issues on Debian GNU/Linux systems.
  • Full life-cycle support of medium and small business networking infrastructure, including VPN, network security, wireless networks, routing, DNS, DHCP, and authentication.


  • Doctor of Philosophy (PhD) in Cell, Molecular and Developmental Biology at UC Riverside
  • Batchelor of Science (BS) in Biology at UC Riverside


Data Science

  • Reproducible, scalable analyses using R, perl, and python with workflows on cloud- and cluster-based systems on terabyte-scale datasets
  • Experimental design and correction to overcome multiple testing, confounders, and batch effects using Bayesian and frequentist methods
  • Design, development, and deployment of algorithms and data-driven products, including APIs, reports, and interactive web applications
  • Statistical modeling (regression, inference, prediction/forecasting, time series, and machine learning in very large (> 1TB) datasets)
  • Data mining, cleaning, processing and quality assurance of data sources and products using tidydata formalisms
  • Visualization using R, ggplot, Shiny, and custom written routines.

Software Development

  • Languages: perl, R, C, C++, python, groovy, sh, make
  • Collaborative Development: git, travis, continuous integration, automated testing
  • Web, Mobile: Shiny, jQuery, JavaScript
  • Databases: Postgresql (PL/SQL), SQLite, Mysql, NoSQL
  • Office Software: Gnumeric, Libreoffice, LaTeX, Word, Excel, Powerpoint

Genomics and Epigenomics

  • NGS and array-based Genomics and Epigenomics of complex human diseases using RNA-seq, targeted DNA sequencing, RRBS, Illumina bead arrays, and Affymetrix microarrays from sample collection to publication.
  • Reproducible, scalable bioinformatics analysis using make, nextflow, and cwl based workflows on cloud- and cluster-based systems on terabyte-scale datasets
  • Alignment, annotation, and variant calling using existing and custom software, including GATK, bwa, STAR, and kallisto.
  • Correcting for and experimental design to overcome multiple testing, confounders, and batch effects using Bayesian and frequentist methods approaches
  • Using evolutionary genomics to identify causal human variants


  • Statistical modeling (regression, inference, prediction, and learning in very large (> 1TB) datasets)
  • Addressing confounders and batch effects
  • Reproducible research

Big Data

  • Parallel and Cloud Computing (slurm, torque, AWS, OpenStack, Azure)
  • Inter-process communication: MPI, OpenMP
  • Filestorage: Gluster, CEFS, GPFS, Lustre
  • Linux system administration

Genomics and Epigenomics

  • Linkage and association-based mapping of complex phenotypes using next-generation sequencing and arrays
  • Alignment, annotation, and variant calling using existing and custom software

Mentoring and Leadership

  • Mentored graduate students and Outreachy and Google Summer of Code interns
  • Former chair of Debian's Technical Committee


  • Strong written communication skills as evidenced by publication record
  • Strong verbal and presentation skills as evidenced by presentation and teaching record

Consortia Involvement

  • H3A Bionet: Generating workflows and cloud resources for H3 Africa
  • Psychiatric Genomics Consortium: Identification of epigenetic variants which are correlated with PTSD.
  • SLEGEN: System lupus erythematosus genetics consortium.

Authored Software

  • Debbugs: Bug tracking software for the Debian GNU/Linux distribution. [https://bugs.debian.org]
  • CairoHacks: Bookmarks and Raster images for large PDF plots in R.
  • Function2Gene: Gene selection tool based on literature mining which enables Bayesian approaches to significance testing.
  • Helical Wheel Projections: Web-based tool to draw helical wheel protein projections. [http://rzlab.ucr.edu/scripts/wheel]

Publications and Presentations

  • 24 peer-reviewed publications cited over 1800 times: https://dla2.us/pubs
  • H index of 11
  • Numerous invited talks on EWAS of PTSD, genetics of SLE, and Open Source: https://dla2.us/pres

Funding and Awards


  • 2017 R Consortium: Adding Linux Binary Builders to R-Hub Role: Co-PI
  • 2015 Blue Waters Allocation Grant: Making ancestral trees using Bayesian inference to identify disease-causing genetic variants Role: Primary Investigator
  • Tracking placenta and uterine funciton using urinary extracellular vesicles (R21 RFA-HD-16-037) Role: Key Personnel
  • NIAMS R01-AR045650-04 Genetics of Childhood Onset SLE to Chaim O. Jacob. Role: Key Personnel

Scholarships and Fellowships

  • 2001--2003: University of California, Riverside Doctoral Fellowship
  • 1997--2001: Regents of the University of California Scholarship.

Academic Information

You can also read my Curriculum Vitæ (pdf), Research Statement (pdf), and Teaching Statement (pdf).

For my contact information or additional references, please e-mail don@donarmstrong.com