Research

We work on computational approaches to identify and understand various signals embedded in the genome. In doing so, we aim to understand how genomes evolve, where novel functional elements come from, and how the signals are contributing to cell regulation.

Transposable element (TE) activity in somatic cells

Transposable elements are sequences in the genome that can copy themselves or move into a new position. The traditional view is that these sequences are selfish elements that do not have any benefit to the host and rather inflict harm on the host when activated. But, new data on regulatory roles for TEs are emerging.
In our lab, we found significant amount of TE transcription across human somatic cells, that show highly tissue-specific expression patterns, and co-expression with specific set of genes, e.g. chromatin modifiers. We are interested in understanding the how and why of this regulated expression pattern of TEs.

Evolutionary constraint on genomic distance between functional elements

We are interested in how genome structure evolves, and especially the arrangement of functional elements. Many functional elements, such as adjacent domains in multi-domain proteins, or adjacent binding sites in cis-regulatory modules require interaction among themselves or with partners. Yet, they are encoded in one dimension in the genome. We hypothesize that, in order for specific and correct interactions to occur, there are optimal distances between these functional units and that they are maintained across evolutionary time. Since insertions and deletions of DNA (indels) change the distance between these functional units, the genome will be under evolutionary constraint against indel mutations that affect the distance. We are developing statistical models and software implementations to formally test this hypothesis and identify distances that are under evolutionary constraint.

Multi-omics prediction of tissue origin

Cancers of Unknown Primary (CUPs) are cancers that are found at a metastatic stage, but without a primary site. Knowing the primary site for CUPs is important, because it can direct the therapies in cases where there are primary site-specific therapies available. This project aims to build an integrated classifier that combines heterogenous data, e.g. coding and non-coding RNAs and DNA methylation markers, to predict the tissue of origin.