A mixed iteration for nonnegative matrix factorizations

Ştefan M. Şoltuz ^a,b,∗, B.E. Rhoades ^c
^a Dawson College, Mathematics Department, 3040 Sherbrooke Street West, Westmount (Montreal), Quebec H3Z 1A4, Canada
^b Tiberiu Popoviciu Institute of Numerical Analysis, Cluj-Napoca, Romania
^c Indiana University, Mathematics Department, Bloomingtron, IN, USA

Abstract

We show that, under appropriate conditions, one can create a hybrid between two given iterations which can perform better than either of the original ones. This fact provides a freedom of choice. We also give numerical examples in which we compare our hybrid with the dedicated Lee-Seung iteration.

ARTICLE INFO

Keywords:

Non-negative matrix factorization
Lee-Seung iteration

1. Introduction

1.1. Source separation methods. The non-negative matrix factorization (NMF)

Single-channel source separation problems arise when a number of sources emit signals that are mixed and recorded by a single sensor and one is interested in estimating the original sources of the signals based on the recorded mixture. This problem is ill-posed. Several different model channel source separation methods have been used. One method is the autoregressive model (AR). This model captures temporal correlations at the source, as was shown in [2,3]. In these papers it was proved that, for a single channel mixture of stationary (AR) sources, the (AR) coefficients can be uniquely identified and the sources separated. For non-stationary (AR) sources an adaptive sliding -window was introduced to update the process. A complete study of (AR) models can be found in [4,5].

Let $M(m,n)$ denote the collection of $m\times n$ matrices with nonnegative entries. For a given matrix $V\in M(m,n)$ , the Nonnegative Matrix Factorization procedure ( $NMF$ ) is used to find matrices $W\in M(m,r)$ and $H\in M(r,n)$ such that $V=WH$ . Matrix factorization has many applications. For example it is used in source separation and dimensionality reduction or clustering. In using (NMF) it is necessary to compute

\arg\min_{W,H}\frac{1}{2}\|V-WH\|^{2}.

(1)

An excellent survey on (NMF) and matrix factorization could be found in [15].
Other factorization methods used are Vector Quantization (VQ), Principal Component Analysis (PCA) and Independent Component Analysis (ICA). These can be written in the form $V\approx WH$ . The differences between these methods and (NMF) are due to the different constraints placed on factoring matrices. In the (VQ) method the columns of $H$ are constrained to be unary vectors (i.e., all components are zero except for one element equal to 1 ). In the (PCA) procedure the columns of $W$ and the rows of $H$ must be orthogonal. In the (ICA) procedure the rows of $H$ are maximally statistically independent.

A major problem with the (PCA) procedure is that it allows the basis vectors to have both positive and negative components and the data are represented by linear combinations of these vectors. In some applications, the presence of negative

⁰⁰footnotetext: • Corresponding author. at: Dawson College, Mathematics Department, 3040 Sherbrooke Street West, Westmount (Montreal), Quebec H3Z 1A4, Canada. E-mail addresses: smsoltuz@gmail.com (Ş.M. Şoltuz), rhoades@indiana.edu (B.E. Rhoades).

components contradicts the physical reality. For example, the pixels in a gray scale image must have nonnegative entries, so any image with negative intensities would have no reasonable interpretation. Also STFT magnitude is given by nonnegative quantities. The ( $NMF$ ) procedure was developed in an attempt to address this problem. (See $[1,6,7,11,18,19]$ ).

Two iterative (NMF) methods are in use - Alternative Least Squares (ALS) and that of Lee-Seung (LS), the reader may consult [8-13].

1.2. Multiplicative rule. The pointwise product

Consider two functions $F:\mathbb{R}^{M\times N}\rightarrow\mathbb{R}$ and $G:\mathbb{R}^{M\times N}\times\mathbb{R}^{M\times N}\rightarrow\mathbb{R}$ .
Definition 1. [9] One says that $G\left(H,H^{\prime}\right)$ is an auxiliary function for $F(H)$ if the conditions

G\left(H,H^{\prime}\right)\geqslant F(H),\quad G(H,H)=F(H),

(2)

are satisfied.
In [9] the following quantity is considered as the cost function

F(H)=\frac{1}{2}\sum_{i}\left(V_{i}-\sum_{a}W_{ia}H_{a}\right)^{2},

(3)

where $1\leqslant i\leqslant m,1\leqslant a\leqslant r,V_{i}$ is the "ith", ( $1,n$ ) dimension row from $V,H_{a}$ is the " $a$ th", $(1,n)$ row from $H$ and $H=\left(H_{a}\right)_{1\leqslant a\leqslant r}$ .
Lemma 2. [9] If $G$ is an auxiliary function, then $F$ is nonincreasing under the update

H_{n+1}=\arg\min_{H}G\left(H,H_{n}\right).

The Taylor expansion for $F$ in (3), leads to

F(H)=F\left(H_{n}\right)+\left(H-H_{n}\right)\nabla F\left(H_{n}\right)+\frac{1}{2}\left(H-H_{n}\right)^{T}\left(W_{n}^{T}W_{n}\right)\left(H-H_{n}\right).

Leaving both $H$ and $H_{n}$ as variables one obtains as an auxiliary function for $F$ ,

G_{\text{Taylor }}\left(H,H_{n}\right)=F\left(H_{n}\right)+\left(H-H_{n}\right)\nabla F\left(H_{n}\right)+\frac{1}{2}\left(H-H_{n}\right)^{T}\left(W_{n}^{T}W_{n}\right)\left(H-H_{n}\right).

(4)

The quantity $\left(W_{n}^{T}W_{n}\right)$ is a positive semidefinite matrix. Moreover, as noted in [9], the difference matrix between,

K_{n}=\operatorname{diag}\left(\operatorname{diag}\left(W_{n}^{T}W_{n}H_{n}^{T}\cdot/H_{n}^{T}\right)\right)

(5)

and

K_{n}-\left(W_{n}^{T}W_{n}\right)

remains positive semidefinite. (In Matlab, "./" denotes the pointwise division between matrices and "diag" the diagonal of a Matrix, seen as vector). Note that we need it twice, in order to keep the dimensionality right. Hence, a similar quantity as the above Taylor expansion, (4),

G\left(H,H_{n}\right)=F\left(H_{n}\right)+\left(H-H_{n}\right)\nabla F\left(H_{n}\right)+\frac{1}{2}\left(H-H_{n}\right)^{T}\left(K_{n}\right)\left(H-H_{n}\right),

(6)

satisfies the conditions of an auxiliary function of the above $F$ , see [9]. In order to make an upgrade each step for $H_{n+1}$ , the arg min is involved. It is know that under appropriate conditions over $\Phi,\Psi$ the quadratic form $F$ , (with $\Phi$ a positive defined matrix),

F(x)=x^{T}\Phi x+\Psi x+\Theta

(7)

attains its minimum at

\bar{x}=\Phi^{-1}\Psi.

Thus, by setting (6) into (7),

		$\displaystyle\Phi=K_{n}\left(=\operatorname{diag}\left(\operatorname{diag}\left(W_{n}^{T}W_{n}H_{n}^{T}\cdot/H_{n}^{T}\right)\right)\right),$
		$\displaystyle\Psi=\nabla F\left(H_{n}\right),$
		$\displaystyle\Theta=F\left(H_{n}\right),$

we obtain $\bar{x}=H_{n+1}-H_{n}$ and therefore the new $H_{n+1}$ is

	$\displaystyle H_{n+1}-H_{n}=\arg\min_{H}G\left(H,H_{n}\right),$		(8)
	$\displaystyle H_{n+1}-H_{n}=K_{n}^{-1}\nabla F\left(H_{n}\right),$
	$\displaystyle H_{n+1}=H_{n}+K_{n}^{-1}\nabla F\left(H_{n}\right).$

This leads to the Lee-Seung iterative (multiplicative) method.

1.3. The Lee-Seung multiplicative rule (LS)

More specific, for (8), the matrix $H_{n+1}$ is given by

H_{n+1}=H_{n}-\frac{H_{n}}{\left(W_{n}^{T}W_{n}H_{n}\right)}\left(\left(W_{n}^{T}W_{n}H_{n}\right)-\left(W_{n}^{T}V\right)\right)

Basically, the rule is to reduce the "distance" (or the cost function) by choosing at each step, appropriate $\eta_{n}$ , respectively $\gamma_{n}$ ,

	$\displaystyle H_{n+1}=H_{n}+\eta_{n}\left(\left(W_{n}^{T}V\right)-\left(W_{n}^{T}W_{n}H_{n}\right)\right)$		(9)
	$\displaystyle W_{n+1}=W_{n}+\left(\left(VH_{n+1}^{T}\right)-\left(W_{n}H_{n+1}H_{n+1}^{T}\right)\right)\gamma_{n}$

The heart of each iterative method consists in the choice of such $\eta_{n}$ and $\gamma_{n}$ . Specifically, at step $n$ , in [9], each matrix is given pointwisely by

\eta_{ab}=\frac{H_{nab}}{\left(W_{n}^{T}W_{n}H_{n}\right)_{ab}},\gamma_{ab}=\frac{W_{nab}}{\left(W_{n}H_{n+1}H_{n+1}^{T}\right)_{ab}},

(10)

where $(\cdot)_{ab}$ , gives the location, (row and column), within the matrix. In (9) set(10). This leads to Hadamard or pointwise multiplication, denoted by ".". Eventually, the following iteration method is obtained

	$\displaystyle H_{n+1}=H_{n}\cdot\frac{\left(W_{n}^{T}V\right)}{\left(W_{n}^{T}W_{n}H_{n}\right)},$		(11)
	$\displaystyle W_{n+1}=W_{n}\cdot\frac{\left(VH_{n+1}^{T}\right)}{\left(W_{n}H_{n+1}H_{n+1}^{T}\right)}.$

It was reported in [11,13], that the convergence result from [9], actually, does not provide enough conditions for the convergence of (11). A new more stable iteration based on (11)was presented and its convergence was study.

Remark 3. "Lin’s modification" for (11), from [11], consists in adding a "small" positive quantity (i.e. $\delta=10^{-9}$ ), such that the Lee-Seung iteration becomes:

	$\displaystyle H_{n+1}=H_{n}\cdot\frac{\left(W_{n}^{T}V\right)}{\left(W_{n}^{T}W_{n}H_{n}\right)+\delta}$		(12)
	$\displaystyle W_{n+1}=W_{n}\cdot\frac{\left(VH_{n+1}^{T}\right)}{\left(W_{n}H_{n+1}H_{n+1}^{T}\right)+\delta}$

For this new method, $\eta_{n}$ and $\gamma_{n}$ , were set to be:

\eta_{ab}=\frac{H_{nab}}{\left(W_{n}^{T}W_{n}H_{n}\right)_{ab}+\delta},\quad\gamma_{ab}=\frac{W_{nab}}{\left(W_{n}H_{n+1}H_{n+1}^{T}\right)_{ab}+\delta},

1.4. The Alternative least squares method (ALS)

In (9) set

\eta_{n}=\left(W_{n}^{T}W_{n}\right)^{-1}

and alternatively,

The matrix V is generated by real audio data.

\gamma_{n}=\left(H_{n+1}H_{n+1}^{T}\right)^{-1},

to obtain,

	$\displaystyle H_{n+1}=\left(W_{n}^{T}W_{n}\right)^{-1}\left(W_{n}^{T}V\right)$		(13)
	$\displaystyle W_{n+1}=VH_{n+1}\left(H_{n+1}H_{n+1}^{T}\right)^{-1}$

The problem with such iteration is that one or both of $\left(W_{n}^{T}W_{n}\right)^{-1}$ and $\left(H_{n+1}H_{n+1}^{T}\right)^{-1}$ can be negative. In order to solve this problem one can consider the projection onto the nonnegative orthant, denoted by $P_{+}[\cdot]$ . The above Alternating Least Squares iteration becomes

	$\displaystyle H_{n+1}=P_{+}\left(\left(W_{n}^{T}W_{n}\right)^{-1}\left(W_{n}^{T}V\right)\right)$		(14)
	$\displaystyle W_{n+1}=P_{+}\left(VH_{n+1}\left(H_{n+1}H_{n+1}^{T}\right)^{-1}\right)$

The convergence for such iteration is described in [7]. We remark that a quantity such as $\left(A^{T}A\right)^{-1}A$ is similar to the projection operator obtained for the least squares method, (see for example [14]). This method was reported (see also our experiments) to be very fast but unstable, see also [17]. In [17], as well as here, the aim is to obtain a new iteration which has the speed convergence of (ALS) and the stability of (LS).

Typical convergence and divergence behaviors are indicated in Fig. 1. V is the spectrogram of an audio signal A3_whistle.wav ¹, the frame length of the FFT was set to 512 . Hence, the dimension of $\mathbf{V}$ was $m=512,n=174$ . Both $\mathbf{W}$ and H were randomly initialized. The rank $r$ was set to 7 and $50\%$ overlap between the windows was used for generating the spectrogram. All the three algorithms were applied to decompose the music notes from the audio signal.

1.5. The hybrid method

In (9) insert $\eta=\operatorname{in}v\left(W^{-1}W\right)$ , to obtain the iteration

H_{n+1}=H_{n}+\left(W_{n}^{T}W_{n}\right)^{-1}\left(\left(W_{n}^{T}V\right)-\left(W_{n}^{T}W_{n}H_{n}\right)\right).

(15)

A "general" iteration method for such " $H_{n+1}$ " as in (9) would have the following structure

⁰⁰footnotetext: ¹ Available at www.ee.surrey.ac.uk/Personal/W.Wang/demondata.html.

	$\displaystyle X_{n+1}=X_{n}+K\left(Y_{n}\right)^{-1}\left(\left(Y_{n}^{T}V\right)-\left(Y_{n}^{T}Y_{n}X_{n}\right)\right)$		(16)
	$\displaystyle Y_{n+1}=Y_{n}+N\left(X_{n+1}\right)^{-1}\left(VX_{n+1}-Y_{n}X_{n+1}X_{n+1}^{T}\right)$

or,

	$\displaystyle H_{n+1}=H_{n}+A\left(W_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)$		(17)
	$\displaystyle W_{n+1}=W_{n}+B\left(H_{n+1}\right)^{-1}\left(VH_{n+1}-W_{n}H_{n+1}H_{n+!}^{T}\right)$

where $K\left(X_{n}\right)$ is a surrogate for $A\left(W_{n}\right)$ which may be different at each step. We introduce $K$ to be a nonnegative matrix "close enough" to $A\left(W_{n}\right)$ so as to avoid the errors introduced by using a projection onto the positive orthant. Note that $Y_{n}$ is different from $W_{n}$ at each step. But, if $Y_{n}$ is close enough to $W_{n}$ , then $X_{n}$ behaves analogously to $H_{n}$ . We introduced within (16 and (17) two "general" iterations. The study will cover, in this matter, all possible choices for those $K,N,A$ and $B$ within ( 16 and (17). Appropriate settings for $K$ and $N$ from (16) will lead to (LS) iteration. Eventually we will compare it with (ALS), if $A$ and $B$ are well chosen within (17).

Remark 4. Set $A\left(W_{n}\right)^{-1}=\left(\operatorname{diag}\left(\operatorname{diag}\left(W_{n}^{T}W_{n}H_{n}^{T}\cdot/H_{n}^{T}\right)\right)\right)^{-1}$ and $B\left(H_{n+1}\right)^{-1}=\left(\left(\operatorname{diag}\left(\operatorname{diag}\left(W_{n}H_{n+1}H_{n+1}^{T}\cdot/W_{n}\right)\right)\right)\right)^{-1}$ to obtain the Lee-Seung iteration (11). As we can see at each step $A\left(W_{n}\right)$ is changing, even within (15); set $A\left(W_{n}\right)^{-1}=\left(W_{n}^{T}W_{n}\right)^{-1}$ and $B\left(H_{n+1}\right)^{-1}=\left(H_{n+1}H_{n+1}^{T}\right)^{-1}$ , to obtain the (ALS) iteration.

Our main purposes are the following: first to be able to mix (LS) and (ALS) iterations in order to obtain a better one. Second, we show that "structurally" the two algorithms are not very different and we provide the mathematical background for such hybrid to converge.

2. Main results

2.1. Convergence of the hybrid method

Recall the following Lemma:
Lemma 5. [16] Let $\left\{a_{n}\right\}$ be a nonnegative sequence that satisfies

a_{n+1}\leqslant(1-w)a_{n}+\sigma_{n}\mathbf{M},

where $w\in(0,1)$ and $\mathbf{M}>0$ are fixed numbers and $\left\{\sigma_{n}\right\}$ ; is a nonnegative sequence which converges to zero. Then $\lim_{n\rightarrow\infty}a_{n}=0$ .
The result remains true provided that the coefficient of $a_{n}$ stays within an interval in ( 0,1 ).
Proposition 6. Let $\left\{a_{n}\right\}$ be a nonnegative sequence that satisfies

a_{n+1}\leqslant\lambda_{n}a_{n}+\sigma_{n}\mathbf{M},\quad\forall n\geqslant n_{0},

where $\mathbf{M}>0$ is fixed number, $\left\{\lambda_{n}\right\}$ and $\left\{\sigma_{n}\right\}$ ; are nonnegative sequences such that $\left\{\lambda_{n}\right\}\subset(0,\Lambda)$ , for some $\Lambda<1$ and $\lim_{n\rightarrow\infty}\sigma_{n}=0$ . Then $\lim_{n\rightarrow\infty}a_{n}=0$ .

Proof. Note that, for each $n\in\mathbb{N},\lambda_{n}\leqslant\Lambda$ . By defining $(1-w)=\max_{n}\left(1-\lambda_{n}\right)$ , the result follows from Lemma 5 .

Remark 7. If $\lambda_{n}>1$ for each $n$ , the Proposition 6 fails. As an example, choose $a_{n}=n,\lambda_{n}=2$ and $\sigma_{n}=0$ for each $n$ . Then it follows that $n+1=a_{n+1}\leqslant\lambda_{n}a_{n}+\sigma_{n}M=2n$ . Proposition 6 remains true if $\lambda_{n}>1$ , for only a finite subset of $\mathbb{N}$ .

For sake of simplicity, through out this paper, we shall consider the sup - norm for all matrices involved. Within Matlab one use "max (max (…))" command.

Theorem 8. If iteration (17) converges i.e. $\lim_{n\rightarrow\infty}H_{n}=H^{*},\lim_{n\rightarrow\infty}W_{n}=W^{*}$ and $\lim_{n\rightarrow\infty}Y_{n}=W^{*}$ , and there exists $\lambda\in(0,1)$ and $\mathbf{M}>0$ such that for each step $n$ , we have the following relations satisfied

	$\displaystyle\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|\leqslant\lambda<1$		(18)
	$\displaystyle\max\left\{\sup_{n}\left\{\left\\|K\left(Y_{n}\right)^{-1}\right\\|,\left\\|A\left(W_{n}\right)^{-1}\right\\|\right\}\right\}\leqslant\mathbf{M},$

then iteration (16) is also convergent; i.e. $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ .

Proof. Define $M_{n}=Y_{n}-W_{n}$ . Note that,

$\displaystyle X_{n+1}=$	$\displaystyle X_{n}+K\left(Y_{n}\right)^{-1}\left(Y_{n}^{T}V-Y_{n}^{T}Y_{n}X_{n}\right)$	(19)
$\displaystyle=$	$\displaystyle X_{n}+K\left(Y_{n}\right)^{-1}\left(\left(W_{n}^{T}+M_{n}^{T}\right)V-\left(W_{n}^{T}+M_{n}^{T}\right)\left(W_{n}+M_{n}\right)X_{n}\right)$
$\displaystyle=$	$\displaystyle X_{n}+K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}X_{n}\right)$
	$\displaystyle+K\left(Y_{n}\right)^{-1}\left(M_{n}^{T}V-\left(M_{n}^{T}W_{n}+W_{n}^{T}M_{n}+M_{n}^{T}M_{n}\right)X_{n}\right),$

where $W_{n}$ is obtained from (17). Using (17) and (19) one obtains

		$\displaystyle H_{n+1}-X_{n+1}$
		$\displaystyle=\left(H_{n}-X_{n}\right)+A\left(W_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)-K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}X_{n}\right)$
		$\displaystyle-K\left(Y_{n}\right)^{-1}\left(M_{n}^{T}V-\left(M_{n}^{T}W_{n}+W_{n}^{T}M_{n}+M_{n}^{T}M_{n}\right)X_{n}\right)$
		$\displaystyle=\left(H_{n}-X_{n}\right)+A\left(W_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)-K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)$
		$\displaystyle+K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)-K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}V-W_{n}^{T}W_{n}X_{n}\right)$
		$\displaystyle-K\left(Y_{n}\right)^{-1}\left(M_{n}^{T}V-\left(M_{n}^{T}W_{n}+W_{n}^{T}M_{n}+M_{n}^{T}M_{n}\right)X_{n}\right)$
		$\displaystyle=\left(H_{n}-X_{n}\right)+K\left(Y_{n}\right)^{-1}\left(W_{n}^{T}W_{n}X_{n}-W_{n}^{T}W_{n}H_{n}\right)+\left(A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right)\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)$
		$\displaystyle-K\left(Y_{n}\right)^{-1}\left(M_{n}^{T}V-\left(M_{n}^{T}W_{n}+W_{n}^{T}M_{n}+M_{n}^{T}M_{n}\right)X_{n}\right)$
		$\displaystyle=\left(I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right)\left(H_{n}-X_{n}\right)+\left(A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right)\left(W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right)$
		$\displaystyle-K\left(Y_{n}\right)^{-1}\left(M_{n}^{T}V-\left(M_{n}^{T}W_{n}+W_{n}^{T}M_{n}+M_{n}^{T}M_{n}\right)X_{n}\right).$

Thus,

		$\displaystyle\left\\|H_{n+1}-X_{n+1}\right\\|$
		$\displaystyle\leqslant\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|\left\\|H_{n}-X_{n}\right\\|$
		$\displaystyle+\left\\|A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right\\|\left\\|W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right\\|$
		$\displaystyle+\left\\|K\left(Y_{n}\right)^{-1}\right\\|\left\\|V-W_{n}X_{n}\right\\|\left\\|M_{n}^{T}\right\\|$
		$\displaystyle+\left\\|K\left(Y_{n}\right)^{-1}\right\\|\left\\|W_{n}^{T}+M_{n}^{T}\right\\|\left\\|M_{n}\right\\|\left\\|X_{n}\right\\|.$

We shall consider here the sup - sup-norm such that $\left\|I_{r,r}\right\|=1$ . Denote by

		$\displaystyle a_{n}=\left\\|H_{n}-X_{n}\right\\|$
		$\displaystyle\lambda_{n}=\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|$
		$\displaystyle\sigma_{n}=\max\left\{\left\\|W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right\\|,\left\\|M_{n}^{T}\right\\|,\left\\|M_{n}\right\\|\right\}$
		$\displaystyle\mathbf{M}=\sup_{n}\left\{\left\\|A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right\\|,\left\\|K\left(Y_{n}\right)^{-1}\right\\|,\left\\|X_{n}\right\\|\right\}$

Note that one has $\lambda_{n}\in(0,\Lambda)$ and $\sigma_{n}\rightarrow 0$ , therefore from Proposition 6, we obtain $\lim_{n\rightarrow\infty}a_{n}=\lim_{n\rightarrow\infty}\left\|H_{n}-X_{n}\right\|=0$ , that is $\lim_{n\rightarrow\infty}\left\|X_{n}\right\|=H^{*}$ . Using the inequality,

\left\|X_{n}-H^{*}\right\|\leqslant\left\|H_{n}-H^{*}\right\|+\left\|H_{n}-X_{n}\right\|,

it follows that $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ .

Remark 9. Using Matlab it is easy to verify each step, of the condition of (18) by using (max (max (eye (r,r)-inv (diag (diag $\left.\left.\left.\left.\left(\left(A^{**}A^{*}B\right)./B\right)\right)\right)^{*}\left(A^{**}A\right)\right)\right)$ , where size $(A)=(m,r)$ and size $(B)=(r,n)$ . The second condition of (18), simply demands an upper bound for those matrices involved in the process.

2.2. Further results

In (17), set $A\left(W_{n}\right)=\left(W_{n}^{T}W_{n}\right)^{-1}$ , (respectively, $B\left(H_{n+1}\right)=\left(H_{n+1}H_{n+1}^{T}\right)^{-1}$ ) to obtain the (ALS) method. The above Theorem leads to the following result.

Corollary 10. If the iteration (15) converges i.e. $\left(\lim_{n\rightarrow\infty}H_{n}=H^{*},\lim_{n\rightarrow\infty}W_{n}=W^{*}\right.$ and $\left.\lim_{n\rightarrow\infty}Y_{n}=W^{*}\right)$ , and there exists a $\lambda\in(0,1)$ and an $\mathbf{M}>0$ such that for each step $n$ , the following relations are satisfied

		$\displaystyle\left\\|I_{r,r}-K\left(X_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|\leqslant\lambda<1$
		$\displaystyle\max\left\{\sup_{n}\left\{\left\\|K\left(X_{n}\right)^{-1}\right\\|,\left\\|\left(W_{n}^{T}W_{n}\right)^{-1}\right\\|\right\}\right\}\leqslant\mathbf{M}$

then the iteration (16) is also convergent; i.e. $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ .

Remark 11.

(a) In other words, this Corollary claims that a hybrid is allowed, provided that appropriate assumptions are satisfied.
(b) Note that by considering the diagonal matrix

A\left(W_{n}\right)=\operatorname{diag}\left(\operatorname{diag}\left(W_{n}^{T}W_{n}H_{n}^{T}\cdot/H_{n}^{T}\right)\right)

(17) becomes the Lee-Seung iteration (with the Hadamard product).

Proposition 12. In Theorem 8 one can replace

\left\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\|\leqslant\lambda

with

\frac{1-\lambda}{\left\|W_{n}^{T}W_{n}\right\|}\leqslant\left\|K\left(Y_{n}\right)^{-1}\right\|

Proof. Note that

		$\displaystyle 1-\left\\|K\left(Y_{n}\right)^{-1}\right\\|\left\\|W_{n}^{T}W_{n}\right\\|\leqslant 1-\left\\|K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|$
		$\displaystyle\leqslant\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|\leqslant\lambda$

to obtain the conclusion.
By duality, one can consider the case in which $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ to obtain.
Corollary 13. If the iteration (17) converges (i.e. $\lim_{n\rightarrow\infty}H_{n}=H^{*},\lim_{n\rightarrow\infty}W_{n}=W^{*}$ and $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ ), and there exists a $\lambda\in(0,1)$ and an $\mathbf{M}>0$ such that for each step $n$ , we have the following relations satisfied

		$\displaystyle\left\\|I_{r,r}-H_{n+1}H_{n+1}^{T}N\left(X_{n}\right)^{-1}\right\\|\leqslant\lambda<1$
		$\displaystyle\max\left\{\sup_{n}\left\{\left\\|N\left(X_{n}\right)^{-1}\right\\|,\left\\|B\left(H_{n+1}\right)^{-1}\right\\|\right\}\right\}\leqslant\mathbf{M}$

then the iteration (16) is also convergent; i.e. $\lim_{n\rightarrow\infty}Y_{n}=W^{*}$ .
As in [9], the next step is to consider the second part of (9) with $B\left(H_{n+1}\right)=\left(H_{n+1}H_{n+1}^{T}\right)$ ; i.e.

W_{n+1}=W_{n}+\left(V-W_{n}H_{n+1}\right)H_{n+1}^{T}\left(H_{n+1}H_{n+1}^{T}\right)^{-1}

and the following quantity from (16),

Y_{n+1}=Y_{n}+\left(V-Y_{n}X_{n+1}\right)X_{n+1}N\left(X_{n}\right)^{-1}

to obtain a similar result to Corollary 10.
Corollary 14. If the iteration (15) converges (i.e. $\lim_{n\rightarrow\infty}H_{n}=H^{*},\lim_{n\rightarrow\infty}W_{n}=W^{*}$ and $\lim_{n\rightarrow\infty}X_{n}=H^{*}$ ) and there exists a $\lambda\in(0,1)$ and an $\mathbf{M}>0$ such that for each step $n$ , we have the following relations satisfied

		$\displaystyle\left\\|I_{r,r}-H_{n+1}H_{n+1}^{T}N\left(X_{n}\right)^{-1}\right\\|\leqslant\lambda<1$
		$\displaystyle\max\left\{\sup_{n}\left\{\left\\|N\left(X_{n}\right)^{-1}\right\\|,\left\\|\left(H_{n+1}H_{n+1}^{T}\right)^{-1}\right\\|\right\}\right\}\leqslant\mathbf{M}$

then the iteration (16) is also convergent; i.e. $\lim_{n\rightarrow\infty}Y_{n}=W^{*}$ .
A result similar to Proposition 12 also holds. For practitioners the changed condition may be more useful.

The matrix V is generated by real audio data.

Remark 15. Analogously, one can replace at each step

\left\|I_{r,r}-H_{n+1}H_{n+1}^{T}N\left(X_{n}\right)^{-1}\right\|\leqslant\lambda

\frac{1-\lambda}{\left\|H_{n+1}H_{n+1}^{T}\right\|}\leqslant\left\|N\left(X_{n}\right)^{-1}\right\|.

3. Numerical examples. Convergence and divergence behaviors

In the first experiment we let $V$ denote the spectrogram of the audio signal C6_frenchhorn.wav ², where the frame length of the FFT was set at 512 . Thus the dimension of $V$ was $m=512$ and $p=174$ . Both $\mathbf{W}$ and $\mathbf{H}$ were randomly initialized. The rank $r$ was set to 7 and $50\%$ overlap between the windows for generating the spectrogram. All three algorithms were applied to decompose the music notes from the audio signal. The convergence curves were averaged over 20 independent tests. The results are shown in Fig. 2.

In our second experiment, we generated $\mathbf{V}$ synthetically as the absolute value of a zero-mean Gaussian distributed random variable and initialized $\mathbf{W}$ and $\mathbf{H}$ in the same way. The dimensions of these matrices were set as $m=500,n=300$ and $r=7$ . The Matlab Program performed 20 independent random tests in which both $\mathbf{W}$ and $\mathbf{H}$ were kept the same for all the three algorithms. The evolution of the cost function averaged over the 20 tests. In Fig. 3 are shown the behaviors of the proposed algorithm, as well as the (LS) and (ALS) algorithms. When $r$ is increased to 13 or higher, the (ALS) algorithm becomes unstable, while the proposed algorithm still converges, even though its rate of convergence becomes slower than that of the (LS) algorithm.

Acknowledgement

The first author is indebted to Ioana C. Soltuz, Maria Soltuz for their constant support throughout this journey we made together. Also, the authors are indebted to a referee for carefully reading the paper and for making useful suggestions.

References

[1] R. Albright, J. Cox, D. Duling, A. Langville, C. Meyer, Algorithms, initializations and convergence for the nonnegative matrix factorization, NCSU Technical Report Math 81706, 2006, submitted for publication, URL http://meyer.math.ncsu.edu/Meyer/Abstracts/Publications.html.
[2] R. Balan, J. Rosca, A spectral power factorization, Siemens Corporate Research. Princeton, NJ, Tech. Rep. SCR-01-TR-703, Sep 2001.
[3] R. Balan, A. Jourjine, J. Rosca, AR process and sources can be reconstructed from degenerate mixtures, in Independent Component Analysis and Blind Signal Separation, International Conference on (ICA), Jan 1999, pp. 467-472.
[4] P.J. Brockwell, R. Davis, Time Series: Theory and Methods, Springer, 1991.
[5] P.J. Brockwell, R. Davis, Introduction to Time Series and Forecasting, Springer, 1996.
[6] A. Ben Hamza, D.J. Brady, Reconstructin of reflectance spectra using robust NMF, IEEE Trans. Signal Process. 54 (2006) 9. de unde am ciatt.
[7] M. Berry, M. Browne, A. Langville, P. Pauca, R.J. Plemmons, Algorithms and appliactions for approximation nonnegative matrix factorization, Computational Statistics and Data Analysis, 2006.
[8] D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts?, in: Advances in Neural Information Processing Systems (NIPS), vol. 17.
[9] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Adv. Neural Inf. Process. Syst. 13 (2) (2005) 556-562.
[10] Daniel D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788-791.
[11] C.J. Lin, On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Trans. Neutral Networks 18 (6) (2007) 1589-1596.
[12] C.J. Lin, Projected gradient methods for non-negative matrix factorization, Neural Comput. (2007). to be published.
[13] E.F. Gonzales, Y. Zhang, Accelerating the Lee-Seung algorithm for non-negative matrix factorization, Dept. Comput. Appl. Math. Rice Univ. Houston TX, Tech. Rep. 2005.
[14] G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, 1986.
[15] M.N. Schmidt, Single-channel source separation using non-negative matrix factorization, Ph. D. Thesis, Technical University of Denmark, 2008.
[16] Ştefan M. Şoltuz, Sequences supplied by inequalities and applications, Revue d’analyse numerique et de theorie de l’approximation 29 (2) (2001) 207212.
[17] Ştefan M. Şoltuz, W.Wang, P. Jackson, A hybrid iterative algorithm for nonnegative matrix factorization, Workshop 15th on Statistical Signal Processing, 2009. SSP ’09. IEEE/SP Cardiff.
[18] W. Wang, X. Zou, Nonnegative matrix factorization based on projected nonlinear conjugate gradient algorithm, ICARN 2008.
[19] R. Zdenuk, A. Cichocki, Nonneagtive matrix factorization with quadratic programming, Neurocomputing 71 (2007) 2309-2320.

		$\displaystyle\left\\|H_{n+1}-X_{n+1}\right\\|$
		$\displaystyle\leqslant\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|\left\\|H_{n}-X_{n}\right\\|$
		$\displaystyle+\left\\|A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right\\|\left\\|W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right\\|$
		$\displaystyle+\left\\|K\left(Y_{n}\right)^{-1}\right\\|\left\\|V-W_{n}X_{n}\right\\|\left\\|M_{n}^{T}\right\\|$
		$\displaystyle+\left\\|K\left(Y_{n}\right)^{-1}\right\\|\left\\|W_{n}^{T}+M_{n}^{T}\right\\|\left\\|M_{n}\right\\|\left\\|X_{n}\right\\|.$

		$\displaystyle a_{n}=\left\\|H_{n}-X_{n}\right\\|$
		$\displaystyle\lambda_{n}=\left\\|I_{r,r}-K\left(Y_{n}\right)^{-1}W_{n}^{T}W_{n}\right\\|$
		$\displaystyle\sigma_{n}=\max\left\{\left\\|W_{n}^{T}V-W_{n}^{T}W_{n}H_{n}\right\\|,\left\\|M_{n}^{T}\right\\|,\left\\|M_{n}\right\\|\right\}$
		$\displaystyle\mathbf{M}=\sup_{n}\left\{\left\\|A\left(W_{n}\right)^{-1}-K\left(Y_{n}\right)^{-1}\right\\|,\left\\|K\left(Y_{n}\right)^{-1}\right\\|,\left\\|X_{n}\right\\|\right\}$

A mixed iteration for nonnegative matrix factorizations

Abstract

Authors

Keywords

References

Paper coordinates

PDF

About this paper

Journal

Publisher Name

DOI

Print ISSN

Online ISSN

Google Scholar Profile

References

Paper (preprint) in HTML form

A mixed iteration for nonnegative matrix factorizations

Abstract

ARTICLE INFO

Keywords:

1. Introduction

1.1. Source separation methods. The non-negative matrix factorization (NMF)

1.2. Multiplicative rule. The pointwise product

1.3. The Lee-Seung multiplicative rule (LS)

1.4. The Alternative least squares method (ALS)

1.5. The hybrid method

2. Main results

2.1. Convergence of the hybrid method

2.2. Further results

Remark 11.

3. Numerical examples. Convergence and divergence behaviors

Acknowledgement

References

Related Posts