Abstract
We show that, under appropriate conditions, one can create a hybrid between two given iterations which can perform better than either of the original ones. This fact provides a freedom of choice. We also give numerical examples in which we compare our hybrid with the dedicated Lee–Seung iteration.
Authors
S.M. Soltuz
(Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy)
B.E. Rhoades
Keywords
Non-negative matrix factorization; Lee–Seung iteration.
References
See the expanding block below.
Paper coordinates
Ş.M. Şoltuz, B.E. Rhoades, A mixed iteration for nonnegative matrix factorizations, Appl. Math. Comput., 219 (2013) no. 18, 9847-9855.
doi: 10.1016/j.amc.2013.03.124
About this paper
Print ISSN
0096-3003
Online ISSN
Google Scholar Profile
soon
soon
Paper (preprint) in HTML form
A mixed iteration for nonnegative matrix factorizations
Abstract
We show that, under appropriate conditions, one can create a hybrid between two given iterations which can perform better than either of the original ones. This fact provides a freedom of choice. We also give numerical examples in which we compare our hybrid with the dedicated Lee-Seung iteration.
ARTICLE INFO
Keywords:
Non-negative matrix factorization
Lee-Seung iteration
1. Introduction
1.1. Source separation methods. The non-negative matrix factorization (NMF)
Single-channel source separation problems arise when a number of sources emit signals that are mixed and recorded by a single sensor and one is interested in estimating the original sources of the signals based on the recorded mixture. This problem is ill-posed. Several different model channel source separation methods have been used. One method is the autoregressive model (AR). This model captures temporal correlations at the source, as was shown in [2,3]. In these papers it was proved that, for a single channel mixture of stationary (AR) sources, the (AR) coefficients can be uniquely identified and the sources separated. For non-stationary (AR) sources an adaptive sliding -window was introduced to update the process. A complete study of (AR) models can be found in [4,5].
Let denote the collection of matrices with nonnegative entries. For a given matrix , the Nonnegative Matrix Factorization procedure ( ) is used to find matrices and such that . Matrix factorization has many applications. For example it is used in source separation and dimensionality reduction or clustering. In using (NMF) it is necessary to compute
| (1) |
An excellent survey on (NMF) and matrix factorization could be found in [15].
Other factorization methods used are Vector Quantization (VQ), Principal Component Analysis (PCA) and Independent Component Analysis (ICA). These can be written in the form . The differences between these methods and (NMF) are due to the different constraints placed on factoring matrices. In the (VQ) method the columns of are constrained to be unary vectors (i.e., all components are zero except for one element equal to 1 ). In the (PCA) procedure the columns of and the rows of must be orthogonal. In the (ICA) procedure the rows of are maximally statistically independent.
A major problem with the (PCA) procedure is that it allows the basis vectors to have both positive and negative components and the data are represented by linear combinations of these vectors. In some applications, the presence of negative
components contradicts the physical reality. For example, the pixels in a gray scale image must have nonnegative entries, so any image with negative intensities would have no reasonable interpretation. Also STFT magnitude is given by nonnegative quantities. The ( ) procedure was developed in an attempt to address this problem. (See ).
Two iterative (NMF) methods are in use - Alternative Least Squares (ALS) and that of Lee-Seung (LS), the reader may consult [8-13].
1.2. Multiplicative rule. The pointwise product
Consider two functions and .
Definition 1. [9] One says that is an auxiliary function for if the conditions
| (2) |
are satisfied.
In [9] the following quantity is considered as the cost function
| (3) |
where is the "ith", ( ) dimension row from is the " th", row from and .
Lemma 2. [9] If is an auxiliary function, then is nonincreasing under the update
The Taylor expansion for in (3), leads to
Leaving both and as variables one obtains as an auxiliary function for ,
| (4) |
The quantity is a positive semidefinite matrix. Moreover, as noted in [9], the difference matrix between,
| (5) |
and
remains positive semidefinite. (In Matlab, "./" denotes the pointwise division between matrices and "diag" the diagonal of a Matrix, seen as vector). Note that we need it twice, in order to keep the dimensionality right. Hence, a similar quantity as the above Taylor expansion, (4),
| (6) |
satisfies the conditions of an auxiliary function of the above , see [9]. In order to make an upgrade each step for , the arg min is involved. It is know that under appropriate conditions over the quadratic form , (with a positive defined matrix),
| (7) |
attains its minimum at
Thus, by setting (6) into (7),
we obtain and therefore the new is
| (8) | |||
This leads to the Lee-Seung iterative (multiplicative) method.
1.3. The Lee-Seung multiplicative rule (LS)
More specific, for (8), the matrix is given by
Basically, the rule is to reduce the "distance" (or the cost function) by choosing at each step, appropriate , respectively ,
| (9) | |||
The heart of each iterative method consists in the choice of such and . Specifically, at step , in [9], each matrix is given pointwisely by
| (10) |
where , gives the location, (row and column), within the matrix. In (9) set(10). This leads to Hadamard or pointwise multiplication, denoted by ".". Eventually, the following iteration method is obtained
| (11) | |||
It was reported in [11,13], that the convergence result from [9], actually, does not provide enough conditions for the convergence of (11). A new more stable iteration based on (11)was presented and its convergence was study.
Remark 3. "Lin’s modification" for (11), from [11], consists in adding a "small" positive quantity (i.e. ), such that the Lee-Seung iteration becomes:
| (12) | |||
For this new method, and , were set to be:
1.4. The Alternative least squares method (ALS)
In (9) set
and alternatively,
The matrix V is generated by real audio data.
to obtain,
| (13) | |||
The problem with such iteration is that one or both of and can be negative. In order to solve this problem one can consider the projection onto the nonnegative orthant, denoted by . The above Alternating Least Squares iteration becomes
| (14) | |||
The convergence for such iteration is described in [7]. We remark that a quantity such as is similar to the projection operator obtained for the least squares method, (see for example [14]). This method was reported (see also our experiments) to be very fast but unstable, see also [17]. In [17], as well as here, the aim is to obtain a new iteration which has the speed convergence of (ALS) and the stability of (LS).
Typical convergence and divergence behaviors are indicated in Fig. 1. V is the spectrogram of an audio signal A3_whistle.wav 1, the frame length of the FFT was set to 512 . Hence, the dimension of was . Both and H were randomly initialized. The rank was set to 7 and overlap between the windows was used for generating the spectrogram. All the three algorithms were applied to decompose the music notes from the audio signal.
1.5. The hybrid method
In (9) insert , to obtain the iteration
| (15) |
A "general" iteration method for such " " as in (9) would have the following structure
| (16) | |||
or,
| (17) | |||
where is a surrogate for which may be different at each step. We introduce to be a nonnegative matrix "close enough" to so as to avoid the errors introduced by using a projection onto the positive orthant. Note that is different from at each step. But, if is close enough to , then behaves analogously to . We introduced within (16 and (17) two "general" iterations. The study will cover, in this matter, all possible choices for those and within ( 16 and (17). Appropriate settings for and from (16) will lead to (LS) iteration. Eventually we will compare it with (ALS), if and are well chosen within (17).
Remark 4. Set and to obtain the Lee-Seung iteration (11). As we can see at each step is changing, even within (15); set and , to obtain the (ALS) iteration.
Our main purposes are the following: first to be able to mix (LS) and (ALS) iterations in order to obtain a better one. Second, we show that "structurally" the two algorithms are not very different and we provide the mathematical background for such hybrid to converge.
2. Main results
2.1. Convergence of the hybrid method
Recall the following Lemma:
Lemma 5. [16] Let be a nonnegative sequence that satisfies
where and are fixed numbers and ; is a nonnegative sequence which converges to zero. Then .
The result remains true provided that the coefficient of stays within an interval in ( 0,1 ).
Proposition 6. Let be a nonnegative sequence that satisfies
where is fixed number, and ; are nonnegative sequences such that , for some and . Then .
Proof. Note that, for each . By defining , the result follows from Lemma 5 .
Remark 7. If for each , the Proposition 6 fails. As an example, choose and for each . Then it follows that . Proposition 6 remains true if , for only a finite subset of .
For sake of simplicity, through out this paper, we shall consider the sup - norm for all matrices involved. Within Matlab one use "max (max (…))" command.
Theorem 8. If iteration (17) converges i.e. and , and there exists and such that for each step , we have the following relations satisfied
| (18) | |||
then iteration (16) is also convergent; i.e. .
Proof. Define . Note that,
| (19) | ||||
where is obtained from (17). Using (17) and (19) one obtains
Thus,
We shall consider here the sup - sup-norm such that . Denote by
Note that one has and , therefore from Proposition 6, we obtain , that is . Using the inequality,
it follows that .
Remark 9. Using Matlab it is easy to verify each step, of the condition of (18) by using (max (max (eye (r,r)-inv (diag (diag , where size and size . The second condition of (18), simply demands an upper bound for those matrices involved in the process.
2.2. Further results
In (17), set , (respectively, ) to obtain the (ALS) method. The above Theorem leads to the following result.
Corollary 10. If the iteration (15) converges i.e. and , and there exists a and an such that for each step , the following relations are satisfied
then the iteration (16) is also convergent; i.e. .
Remark 11.
(a) In other words, this Corollary claims that a hybrid is allowed, provided that appropriate assumptions are satisfied.
(b) Note that by considering the diagonal matrix
(17) becomes the Lee-Seung iteration (with the Hadamard product).
Proposition 12. In Theorem 8 one can replace
with
Proof. Note that
to obtain the conclusion.
By duality, one can consider the case in which to obtain.
Corollary 13. If the iteration (17) converges (i.e. and ), and there exists a and an such that for each step , we have the following relations satisfied
then the iteration (16) is also convergent; i.e. .
As in [9], the next step is to consider the second part of (9) with ; i.e.
and the following quantity from (16),
to obtain a similar result to Corollary 10.
Corollary 14. If the iteration (15) converges (i.e. and ) and there exists a and an such that for each step , we have the following relations satisfied
then the iteration (16) is also convergent; i.e. .
A result similar to Proposition 12 also holds. For practitioners the changed condition may be more useful.
The matrix V is generated by real audio data.
Remark 15. Analogously, one can replace at each step
by
3. Numerical examples. Convergence and divergence behaviors
In the first experiment we let denote the spectrogram of the audio signal C6_frenchhorn.wav 2, where the frame length of the FFT was set at 512 . Thus the dimension of was and . Both and were randomly initialized. The rank was set to 7 and overlap between the windows for generating the spectrogram. All three algorithms were applied to decompose the music notes from the audio signal. The convergence curves were averaged over 20 independent tests. The results are shown in Fig. 2.
In our second experiment, we generated synthetically as the absolute value of a zero-mean Gaussian distributed random variable and initialized and in the same way. The dimensions of these matrices were set as and . The Matlab Program performed 20 independent random tests in which both and were kept the same for all the three algorithms. The evolution of the cost function averaged over the 20 tests. In Fig. 3 are shown the behaviors of the proposed algorithm, as well as the (LS) and (ALS) algorithms. When is increased to 13 or higher, the (ALS) algorithm becomes unstable, while the proposed algorithm still converges, even though its rate of convergence becomes slower than that of the (LS) algorithm.
Acknowledgement
The first author is indebted to Ioana C. Soltuz, Maria Soltuz for their constant support throughout this journey we made together. Also, the authors are indebted to a referee for carefully reading the paper and for making useful suggestions.
References
[1] R. Albright, J. Cox, D. Duling, A. Langville, C. Meyer, Algorithms, initializations and convergence for the nonnegative matrix factorization, NCSU Technical Report Math 81706, 2006, submitted for publication, URL http://meyer.math.ncsu.edu/Meyer/Abstracts/Publications.html.
[2] R. Balan, J. Rosca, A spectral power factorization, Siemens Corporate Research. Princeton, NJ, Tech. Rep. SCR-01-TR-703, Sep 2001.
[3] R. Balan, A. Jourjine, J. Rosca, AR process and sources can be reconstructed from degenerate mixtures, in Independent Component Analysis and Blind Signal Separation, International Conference on (ICA), Jan 1999, pp. 467-472.
[4] P.J. Brockwell, R. Davis, Time Series: Theory and Methods, Springer, 1991.
[5] P.J. Brockwell, R. Davis, Introduction to Time Series and Forecasting, Springer, 1996.
[6] A. Ben Hamza, D.J. Brady, Reconstructin of reflectance spectra using robust NMF, IEEE Trans. Signal Process. 54 (2006) 9. de unde am ciatt.
[7] M. Berry, M. Browne, A. Langville, P. Pauca, R.J. Plemmons, Algorithms and appliactions for approximation nonnegative matrix factorization, Computational Statistics and Data Analysis, 2006.
[8] D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts?, in: Advances in Neural Information Processing Systems (NIPS), vol. 17.
[9] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Adv. Neural Inf. Process. Syst. 13 (2) (2005) 556-562.
[10] Daniel D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (6755) (1999) 788-791.
[11] C.J. Lin, On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Trans. Neutral Networks 18 (6) (2007) 1589-1596.
[12] C.J. Lin, Projected gradient methods for non-negative matrix factorization, Neural Comput. (2007). to be published.
[13] E.F. Gonzales, Y. Zhang, Accelerating the Lee-Seung algorithm for non-negative matrix factorization, Dept. Comput. Appl. Math. Rice Univ. Houston TX, Tech. Rep. 2005.
[14] G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, 1986.
[15] M.N. Schmidt, Single-channel source separation using non-negative matrix factorization, Ph. D. Thesis, Technical University of Denmark, 2008.
[16] Ştefan M. Şoltuz, Sequences supplied by inequalities and applications, Revue d’analyse numerique et de theorie de l’approximation 29 (2) (2001) 207212.
[17] Ştefan M. Şoltuz, W.Wang, P. Jackson, A hybrid iterative algorithm for nonnegative matrix factorization, Workshop 15th on Statistical Signal Processing, 2009. SSP ’09. IEEE/SP Cardiff.
[18] W. Wang, X. Zou, Nonnegative matrix factorization based on projected nonlinear conjugate gradient algorithm, ICARN 2008.
[19] R. Zdenuk, A. Cichocki, Nonneagtive matrix factorization with quadratic programming, Neurocomputing 71 (2007) 2309-2320.
