Yip Lab

Our lab uses computational methods to tackle fundamental problems in biomedicine. We study gene regulatory mechanisms by means of computational modeling. To facilitate our data-centric approach, we develop novel methods for analyzing large amounts of biological data, including those produced by cutting-edge high-throughput experiments. Our computational models provide a systematic way to investigate the functional effects of different types of perturbations to regulatory mechanisms, which creates testable hypotheses for studying human diseases and facilitates translational research.

Construction of quantitative models of gene regulatory mechanisms

Gene expression is regulated by a variety of mechanisms. Most existing knowledge about gene regulation is qualitative in nature. For example, promoter DNA methylation is well-recognized to be associated with transcriptional repression, but it would be much more informative to know the amount of differential expression given a specific change of the promoter methylation level. This type of quantitative understanding is key to linking molecular-level events to high-level phenotypes.

To develop quantitative models of gene regulatory mechanisms, it is required to determine the key players and their relationships. In our research, we have invented methods to perform genome-wide identification of functional sequence elements in humans and model organisms. We have developed machine leaning methods to reconstruct networks that connect these functional elements, such as finding target genes of transcriptional enhancers. We have also modeled the detailed quantitative relationships between gene expression and chromatin accessibility, histone modifications, transcription factor binding and DNA methylation. With all these foundations, recently we have started modeling the joint effects of many gene regulatory mechanisms, involving heterogeneous data types, on gene expression levels.

Our current goal is to infer the functional consequences of every genetic variation in the human genome in a cell-type-specific manner. This would require improving and integrating many components that we have developed over the years, such as identification of active enhancers in each cell type, accurate reconstruction of various types of biological networks, and prediction of the direct effects of genetic variants on the chromatin and epigenetic features of individual elements and their downstream indirect effects propagated over the networks.

Representative publications:

Cao*, Zhang* et al., A Unified Framework for Integrative Study of Heterogeneous Gene Regulatory Mechanisms. Nature Machine Intelligence 2(8):447-456, (2020).
Cao et al., Reconstruction of Enhancer-Target Networks in 935 Samples of Human Primary Cells, Tissues and Cell Lines. Nature Genetics 49(10):1428-1436, (2017).
Yip et al., Classification of Human Genomic Regions based on Experimentally-determined Binding Sites of More Than 100 Transcription-related Factors. Genome Biology 13(9):R48, (2012).
The ENCODE Project Consortium, An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 489(7414):57-74, (2012).
Gerstein*, ..., Yip* et al., Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science 330(6012):1775-1787, (2010).

Development of analysis methods for new experimental technologies

Our lab has deep interests in developing computational methods for analyzing data from emerging experimental technologies, such as single-cell sequencing and long-read sequencing. One technology that we have championed in the past few years is nano-channel-based optical DNA mapping. In this technology, particular sites of DNA molecules are fluorescently labeled, linearized in nanometer-scale channels, and subsequently imaged using high-resolution microscopy. The resulting data contain the locations of fluorescent labels along long DNA molecules up to one megabase long. Due to the long read length, optical mapping data can provide useful information for applications such as sequence assembly, structural variation (SV) calling and haplotype phasing.

We are a leading group in developing analysis methods for optical mapping data. We have developed pairwise and multiple alignment methods, SV callers, and tools for processing and visualizing optical mapping data. Using our methods, we have first demonstrated the different applications using data produced from a family trio, followed by an extended study of genomes from 26 populations. In this latter study, in addition to a comprehensive analysis of SVs, we have also studied the overall genome structures, genomic contents not present in the human reference sequence, and regions difficult to investigate by short-read sequencing such as sub-telomeric regions.

Our current goal is to develop analysis methods for single-cell sequencing and spatial transcriptomics data.

Representative publications:

Yu*, Shi* et al., Quantifying Full-Length Circular RNAs in Cancer. Genome Research 31(12):2340-2353, (2021).
Li*, Gao* et al., New Guidelines for DNA Methylome Studies Regarding 5-hydroxymethylcytosine for Understanding Transcriptional Regulation. Genome Research 29(4):543-553, (2019).
Levy-Sakin*, Pastor*, Mostovoy*, Li*, Leung*, McCaffrey* et al., Genome Maps Across 26 Human Populations Reveal Population-specific Patterns of Structural Variation. Nature Communications 10:1025, (2019).
Li*, Leung*, Kwok* et al., OMSV Enables Accurate and Comprehensive Identification of Large Structural Variations from Nanochannel-based Single-Molecule Optical Maps. Genome Biology 18:230, (2017).
Leung et al., OMBlast: Alignment Tool for Optical Mapping using a Seed-and-Extend Approach. Bioinformatics 33(3):311-319, (2017).

Studying disease mechanisms, drug efficacy, and new therapies

Our lab has extensive collaborations with local and international research groups in the study of various human diseases. We are familiar with standard analysis pipelines for various types of data, while at the same time we have been contributing new analysis approaches based on methods we developed.

We have studied various types of human cancer. For example, we have studied the epigenetic landscape of hepatocellular carcinoma (HCC) to identify disrupted gene regulatory elements, with follow-up validation and functional experiments delineating the molecular mechanisms and proving the medical relevance of our findings. Nasopharyngeal carcinoma (NPC) is another cancer type of our focus. We have contributed to the genomics and transcriptomics of this cancer. We have also studied the genome and transcriptome of the Epstein-Barr virus associated with NPC. In addition to cancer, we have also been studying other human diseases, including diabetes, cardiovascular diseases, intervertebral disc degeneration and Hirschsprung disease. For example, we have recently developed a new method that identifies different non-coding genetic variants that have convergent functional effects in different patients on the same genes and pathways, and applied it to identify novel genes associated with Hirschsprung disease.

We are currently focusing on the development of methods that can predict the efficacy of immune checkpoint inhibition (ICI) treatment on cancer patients.

Representative publications:

Yang*, Feng*, Zhou* et al., A Selective HDAC8 Inhibitor Potentiates Antitumor Immunity and Efficacy of Immune Checkpoint Blockade in Hepatocellular Carcinoma. Science Translational Medicine 13(588):eaaz6804, (2021).
Bruce*, To*, Lui*, Chung* et al., Whole-Genome Profiling of Nasopharyngeal Carcinoma Reveals Viral-Host Co-operation in Inflammatory NF-kB Activation and Immune Escape. Nature Communications 12:4193, (2021).
Fu*, Lui* et al., Whole-Genome Analysis of Noncoding Genetic Variations Identifies Multi-Scale Regulatory Element Perturbations Associated with Hirschsprung Disease. Genome Research 30(11):1618-1632, (2020).
Zhang*, Lee*, Dhiman*, Jiang* et al., An integrative ENCODE Resource for Cancer Genomics. Nature Communications 11:3696, (2020).
Xiong et al., Aberrant Enhancer Hypomethylation Contributes to Hepatic Carcinogenesis through Global Transcriptional Reprogramming. Nature Communications 10:335, (2019).