PhD Research

Concentration in population medicine and epidemiology,
minors in applied statistics and genomics

HMM for infectious disease progression

Infectious disease management relies on accurate characterization of disease progression so that transmission can be prevented. Slowly progressing infectious diseases can be difficult to characterize because of a latency period between the time an individual is infected and when they show clinical signs of disease. The introduction of Mycobacterium avium ssp. paratuberculosis (MAP), the cause of Johne’s disease, onto a dairy farm could be undetected by farmers for years before any animal shows clinical signs of disease. In this time period infected animals may shed thousands of colony forming units. Parameterizing trajectories through disease states from infection to clinical disease can help farmers to develop control programs based on targeting individual disease state, potentially reducing both transmission and production losses due to disease. We suspect that there are two distinct progression pathways; one where animals progress to a high-shedding disease state, and another where animals maintain a low-level of shedding without clinical disease. We fit continuous-time hidden Markov models to multi-year longitudinal fecal sampling data from three US dairy farms, and estimated model parameters using a modified Baum-Welch expectation maximization algorithm. Using posterior decoding, we observed two distinct shedding patterns: cows that had observations associated with a high-shedding disease state, and cows that did not. This model framework can be employed prospectively to determine which cows are likely to progress to clinical disease and may be applied to characterize disease progression of other slowly progressing infectious diseases.

Landscape pangenomics of Mycobacterium bovis

The purpose of this project is to determine how the M. bovis pangenome has evolved over space and time. This involves determining patterns of evolution in the core genome (genes shared by all members of the species) and the accessory genome (genes not present in all members), and to determine if these partitions of the pangenome evolve clonally. M. bovis is thought to be a strictly clonal pathogen, with no known mechanisms for ongoing horizontal gene transfer. Therefore, my overarching hypothesis is that the accessory genome evolves through serial gene deletions, and that these deletions, along with purifying selection and serial bottlenecks created by transmission, causes strong population structure. This strong population structure will lead to genetic signatures that are specific to and diagnostic of geographic location.

Deep learning for outbreak sequence analysis

Whole genome sequencing analysis of M. bovis outbreak samples is used extensively by the USDA to trace outbreaks to their source; however, there is no existing method to reliably estimate how long a herd has been infected for at the time of detection. We believe that the answer to this question lies in M. bovis outbreak sequence data, but the right method for detecting this signal doesn’t yet exist. We're using deep learning to develop predicitive models from outbreak sequence data. More to come soon!