My PhD thesis was defended on April 27, 2017 at the Icahn School of Medicine at Mount Sinai, and deposited on July 5, 2017. The title of the thesis is:

“Multiscale analysis of infectious diseases: integrating omics and clinical informatics data into patient care.”

My thesis advisor was Andrew Kasarskis, PhD.

My thesis committee was as follows:

Full text

Download the complete PDF.


New sources of data, such as next generation sequencing (NGS) of pathogen genomes, electronic medical records (EMR), and omics assays like RNA-seq and mass cytometry, are poised to transform clinical infectious diseases. One of the greatest challenges in applying these “big data” toward clinical practice is the development of bioinformatics techniques and software that make downstream analyses routine, rigorous, and actionable. Herein, I develop software and integrative statistical models for several combinations of these data to address urgent global threats in infectious diseases, including the spread of healthcare associated infections (HAIs), skyrocketing rates of antimicrobial resistance, and understanding human immune responses to poorly characterized mosquito-borne viruses.

I first demonstrate that de novo assembly of long reads can finish the genomes of hospital bacterial isolates with resolution sufficient for discovering a resistance-conferring single-nucleotide variant that emerges during failed antimicrobial therapy—in our case, a quinolone resistance variant in a strain of Stenotrophomonas maltophilia. I then describe two software packages, a new genome browser and a suite of modular bioinformatics pipelines, that facilitate the routine usage of NGS by the Pathogen Surveillance Program at The Mount Sinai Hospital for reconstructing HAI transmission networks between patients. These software additionally provide actionable visualizations of genomic data tailored to infection control physicians. The use of long read data for hospital surveillance is novel and offers unique opportunities to capture recombination and horizontal transfer events occurring throughout the rapid evolution of virulence and resistance in HAI strains.

To convert these data into management strategies, I use machine learning algorithms on seven years of EMR data at Mount Sinai to precisely estimate the marginal cost per patient of Clostridium difficile infection, helping to anchor a cost-benefit analysis for selecting interventions and surveillance investments. Finally, in a more futuristic study, I integrate RNA-seq, mass cytometry, and multiplexed immunoassay data to comprehensively profile the human immune response to chikungunya virus, a re-emerging arthritogenic arbovirus, which reveals a significant role for monocytes and novel biomarkers for clinical outcomes like symptom severity and immunogenicity. Because of the many applications for multiscale analysis I establish in this dissertation, I conclude that it is indeed transforming infectious diseases.


Four chapters of this thesis were published, after reformatting and revisions, as individual peer-reviewed articles. Citations for those publications are as follows:

Chapter 1: Pak TR, Kasarskis A. “How next-generation sequencing and multiscale data analysis will transform infectious disease management.” Clin Infect Dis. 2015 Aug 6. pii: civ670. doi:10.1093/cid/civ670. [PDF]

Chapter 2: Pak TR, Altman DR, Attie O, Sebra R, Hamula CL, Lewis M, Deikus G, Newman LC, Fang G, Hand J, Patel G, Wallach F, Schadt EE, Huprikar S, van Bakel H, Kasarskis A, Bashir A. “Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia.” Antimicrob Agents Chemother. 2015 Aug 31. pii: AAC.01723-15. doi:10.1128/AAC.01723-15. [PDF]

Chapter 5: Pak TR, Chacko KI, O’Donnell T, Huprikar S, van Bakel H, Kasarskis A, Scott ER. “Estimating local costs associated with Clostridium difficile infection using machine learning and electronic medical records.” Infect Control Hosp Epidemiol. 2017 Dec;38(12):1478-1486. doi:10.1017/ice.2017.214. [PDF]

Chapter 6: Michlmayr D,* Pak TR,* Rahman A, Amir ED, Kim E-Y, Kim-Schulze S, Suprun M, Stewart MG, Thomas G, Balmaseda A, Wang L, Zhu J, Suarez-Farinas M, Wolinsky SM, Kasarskis A, Harris E. “Comprehensive innate immune profiling of chikungunya virus infection in pediatric cases.” Mol Syst Biol. 2018 Aug; 14(8): e7862. doi:10.15252/msb.20177862. [PDF]

* denotes equal authorship.

Source code

The full source code for typesetting this thesis is available on GitHub.

You might find it useful if you are trying to format your own thesis in LaTeX, although my approach is somewhat specific to the ISMMS formatting guidelines. I hereby disclaim any responsibility for the amount of procrastination said code enables in the course of writing your own dissertation.