Balanced Non-negative Matrix Factorization on Hi-C Contact Maps

Xihao Hu, Christina Huan Shi, and Kevin Yip*

The Chinese University of Hong Kong

Introduction

Hi-C is a powerful experimental method to probe DNA-DNA long-range interactions on the whole genome [Lieberman-Aiden2009].

We developed a novel computational method using a balanced non-negative matrix factorization (BNMF) that can flexibly identify small clusters of spatially proximal genomic regions based on Hi-C contact maps.

Here, we give examples on how to use our tool.

Materials and Methods

The environmental requirement is:

Or, you can use the free installer provided by Anaconda to avoid any configuration issue.

Download the bnmf package that contains source codes and data.

Uncompress it and you will see

  • contact_map.py -- source code for BNMF
  • yeast_chr_len.txt -- chromosome lengths for the yeast genome
  • hg18_chr_len.txt -- chromosome lengths for the human genome using hg18 reference
  • HindIII_intersect_EcoRI_fdr0.01_inter.txt -- yeast inter-chromosome interactions [Duan2010]
  • HindIII_intersect_EcoRI_fdr0.01_intra.txt -- yeast intra-chromosome interactions [Duan2010]
  • origins_nonCDR_early.txt -- yeast early origin sites [Duan2010]
  • IMR90.uij.chr22 -- human intra-chromosome interaction matrix at 40k resolution [Dixon2012]
  • IMR90.domain.txt -- topological domains defined for IMR90 cell line [Dixon2012]
  • *.ipynb -- the raw files used to generate this tutorial
  • *.html -- ipython notebook files transformed into html format

Then, let's go through following examples:

  1. A toy example to show the idea of BNMF
  2. Studying on a yeast Hi-C contact map
  3. Studying on a human Hi-C contact map

If you have any problem on this tutorial or our paper, please contact the authors.

References

  1. Lieberman-Aiden et. al. 2009 Science. Comprehensive mapping of long-Range interactions reveals folding principles of the human genome.
  2. Duan et. al. 2010 Nature. A three-dimensional model of the yeast genome.
  3. Dixon et. al 2012 Nature. Topological domains in mammalian genomes identified by analysis of chromatin interactions.