Our research is to understand the evolution and adaption of human regulatory networks, with a focus on the impact of these processes on human health and disease. In particular, we investigate the evolutionary model of mobile elements (or transposable elements) and their roles in basic biology and cancer, including their genetic and epigenetic regulation.

We use integrative and systems methods. We develop statistical and computational algorithms to explore the human genome, to integrate cross-species comparative and high-throughput genomics data. We test our hypothesis and validate our predictions in the wet lab.

Our interests span areas of genomics, epigenomics, evolution, computation, systems biology and many more. We also have a general interest in large data integration and visualization, including developing genome and genomics browsers, and developing tools for analyzing high-throughput genomics data, including next-gen sequencing data.

Main questions and hypotheses

Understanding of the mobile elements in our genome remains poor, but mobile elements play a very significant role in genome evolution and human biology as well as disease. Our research will focus on the effect of mobile elements on genome evolution and their roles in basic biology and cancer. How extensive are the interactions between different repetitive elements and transcription factors? How are they regulated genetically, epigenetically, temporally and spatially? How is repression of regulatory elements derived from repetitive DNA established and maintained? How do functional enhancers that are exaptations from ancient transposons escape this inactivation, and how large a role do they play in gene regulation? How does a breakdown of regulation of and by repetitive elements contribute to human cancers? Our mission is to address these questions using integrative systems methods, with both computational and experimental approaches.

Our working hypotheses are:

  1. Certain transcription factors, especially “master regulators” such as p53, were helped by transposon mobility to evolve a larger network of binding sites (or modules) and thereby recruit target genes;
  2. Sequences of mobile elements in the genome are functionally potent and tightly regulated by epigenetic mechanisms, and they provide an evolutionary reservoir for modulating gene regulation;
  3. Mis-regulation of these elements (such as changes in epigenetic modification or rearrangements) causes mis- regulation of nearby genes, contributing to the development of many types of cancer.

Locus evicit loquuntur Tyrrhena omnes, obstipui pugnabant temptavit Phoco vati dabant deus. Memorata haberet sepulcrales gentisque dum sic, in flumina templa! Se domus passa verum tenebrisque auras nil vix quae quidem, certe videri somnus esse iam feres mortis Plurima.

Evolutionary characterization of regulatory networks

It is important to establish an evolutionary model in which we can examine roles played by transposable elements in shaping regulatory networks. We are constructing a computational framework to detect and assess the potential impact of transposable elements on the human transcription regulatory networks.

  1. We are computationally reconstructing the history of transposon fixation in the past 100 million years leading towards the human lineage. This will provide a dynamic view of the evolutionary impact of transposable elements on the human genome.
  2. We will identify interactions between transcription factors and repetitive elements. We are developing computational methods to predict regulatory signals harbored in transposable elements and then validate the predicted interactions with transcription factors using experimental approaches.
  3. We will study the evolutionary signature of regulatory modules harbored within transposable elements, and identify the most likely target genes and pathways that they regulate. We are combining comparative genomics and population genomics, and eventually generate data on the epigenetic status of repeats in many people and extend the study into epigenetic variation in the population.

Biological function and regulation of mobile elements

An exciting hypothesis stemming from our evolutionary theory is that transposable element sequences in the human genome are functionally potent and are tightly regulated, and that mis-regulation of these sequences may lead to human diseases. We are investigating the biological functions of mobile elements in the human genome and how they are regulated. We will also test a hypothesis that mis-regulated transposon sequences may contribute to human cancer.

  1. We will systematically evaluate the regulatory potential of transposable elements and their epigenetic profiles. We will use standard molecular biology techniques including reporter assays, ChIP, DNA methylation assays, siRNA in combination with next-gen sequencing, in multiple cell types. The results from this experiment will provide the first systematic view of the regulatory potential of selected transposable elements.
  2. We will investigate a potential connection between mis-regulation of transposable elements and human cancer. We will systematically profile the histone modification and DNA methylation status of selected transposable elements in cancer cell lines and primary tumor samples. Data generated in this study may provide the first evidence of a role for aberrant epigenetic control of transposable elements in human cancer. Epigenetic profiles of transposable elements also have the potential to serve as a new type of surrogate biomarker that predicts individual response to therapy. Collectively, these results will lay the groundwork for systematically examining the impact of transposable elements on human gene regulation and disease pathogenesis.
  3. A recent analysis finds a surprising connection between DNA methylation status of transposable element and copy number abnormality in specific cancers. We are actively pursuing this exciting finding.

Computational epigenomics

We are part of the Epigenome Roadmap Project. Our lab is a member of one of four NIH funded Reference Epigenome Mapping Centers (REMCs). We are working cooperatively with other Mapping Centers and the Data Coordination Center (EDACC) funded by this Roadmap mechanism to comprehensively map epigenomes of select human cells with significant relevance to complex human disease. Our center, consisting of scientists at UCSF, UC Davis, UCSC and the British Columbia Genome Sciences Centre has the broad expertise that this project requires. We are focusing on cells relevant to human health and complex disease including cells from the blood, brain, breast and human embryonic stem cells. We will incorporate high quality, homogeneous cells from males and females, and two predominant racial groups, and biological replicates of each cell type.

Production of comprehensive maps will include 6 histone modifications selected for their opposing roles in regulating active and inactive chromatin, DNA methylation and miRNA and gene expression. This epigenetic data, along with genetic and expression data will be integrated using advanced informatics to address fundamental roles of epigenetics in differentiation, maintenance of cell-type identity and gene expression.

Our reference epigenomes will enable new disciplines including human population epigenetics, comparative epigenomics, neuroepigenetics, and therapeutic epigenetics for tissue regeneration and reversal of disease.

Two specific roles our lab plays in our REMC:

  1. Develop informatics pipeline for data process, visualization and integration.
  2. Generate and validate epigenomic maps for repeatitive regions of the human genome.

Developement of bioinformatics tools

We are committed to transforming theories and algorithms into software applications that not only facilitate our own research, but also benefit researchers in a broader scientific community. Specifically, we are interested in developing and implementing algorithms in the following areas:

  1. Next-generation sequencing short reads mapping and assembly. The algorithm will take advantage of paired-end reads and be able to map repetitive regions, which is essential in studying mobile elements and impossible with most existing computational tools.
  2. Motif/module discovery. The algorithm will predict regulatory motifs and modules by incorporating comparative genomics data and natural variation data.
  3. Genomic data integration and visualization. I will continue my efforts towards developing and improving the UCSC Cancer Genomics Browser and associated analysis tools, with a special focus on integration of multiple sources of genomics-based information in cancer research.