The principle of our decomposition framework can be explained by an example. If a matrix is balanced, meaning equal row sums (and equal column sums), it can be decomposed into $A=HSH^{T}$ such as \begin{aligned} A=\left(\begin{array}{ccc} 0.5 & 0.5 & 0 \newline 0.5 & 0.5 & 0 \newline 0 & 0 & 1 \end{array}\right),H & =\left(\begin{array}{cc} 0.5 & 0 \newline 0.5 & 0 \newline 0 & 1 \end{array}\right),S=\left(\begin{array}{cc} 2 & 0 \newline 0 & 1 \end{array}\right) \newline W & =SH^{T}=\left(\begin{array}{cc} 1 & 1 & 0 \newline 0 & 0 & 1 \newline \end{array}\right) \end{aligned} where all the column sums in $H$ are 1 and all the column sums in $W$ are 1.
This property can help us to interpret the decomposition result: each column in $H$ saves the probabilities of each data point belonging to the cluster, while each column in $W$ saves the probability of a data point belonging to different clusters. For the two clusters, the corresponding sizes are 2 and 1 that are saved in the diagnal of $S$.
However, such property requires the summation of each rows/columns in $A$ is equal to 1, which is very rare in real applications. Instead, we loose the constraint on the input by allowing a fixed bias at each position. Assume we have a new input $X$ which can be decomposed into $X=BAB$: \begin{aligned} X=\left(\begin{array}{ccc} 8 & 4 & 0 \newline 4 & 2 & 0 \newline 0 & 0 & 9 \end{array}\right),B=\left(\begin{array}{ccc} 4 & 0 & 0 \newline 0 & 2 & 0 \newline 0 & 0 & 3 \end{array}\right) \end{aligned} After combining the two formulas, we get the problem setting for BNMF. \begin{aligned} X = BHSH^{T}B \end{aligned}
We can test this example in python.
from contact_map import NMF_main, NNDSVD
import numpy as np
H = np.matrix([[0.5,0],[0.5,0],[0,1]], dtype='float')
S = np.matrix([[2,0],[0,1]], dtype='float')
B = np.diagflat(np.array([4,2,3], dtype='float'))
W = S*H.T
A = H*W
X = B*A*B
print X
print B
print H
print S
print W
We initialize BNMF using NNDSVD [Boutsidis&Gallopoulos2007]. Because the matrix is so small, we set a large shift on matrix values by setting add=1
in hope to avoid trivial solutions.
initH, initS = NNDSVD(X, r=2, add=1)
print np.round(initH, 2)
print np.round(initS, 2)
Run BNMF with following parameters.
H=initH
sets the initial $H$S=initS
sets the initial $S$L=0
sets the manifold parameter $\lambda$ to be zero because we don't need it now.r=2
sets the number of clusters to be 2.outG, outS, nmf_obj = NMF_main(X, H=initH, S=initS, r=2, J='NMF-PoissonManifoldEqual', L=0)
Extract $B$ and $H$ from $G=BH$ using $B = \mathrm{Diag}(\frac{n\sum_{i=1}^{r}(SG{}^{T})_{i}}{\sum_{i=1}^{r}\sum_{j=1}^{n}(SG^{T})_{ij}}) $ and $H = B^{-1}G$ as shown in our paper.
outB = np.asarray(np.dot(outG, outS)).mean(axis=1)
outB /= outB.mean()
outH = outG / outB[:,np.newaxis]
print np.diagflat(np.round(outB, 2))
print np.round(outH, 2)
print np.round(outS, 2)
They are equal to our orignal matrix factors after proper scaling and ordering.
## scale S
baseS = np.diag(outS).min()
newS = outS / baseS
## scale B and H
baseH = outH.sum(0).mean()
newH = outH / baseH
newB = outB * baseH * np.sqrt(baseS)
## change the cluster order
ords = np.argsort(np.diag(newS))[::-1]
newH = newH[:,ords]
newS = newS[ords,:][:,ords]
print np.round(np.diagflat(newB)-B, 1)
print np.round(newH-H, 1)
print np.round(newS-S, 1)
Therefore, our BNMF works for this simple example.