# Low-rank matrix approximations over canonical subspaces

September 15, 2019; accepted: December 7, 2019; published online: August 11, 2020.

In this paper we derive closed form expressions for the nearest rank-\(k\) matrix on canonical subspaces. We start by studying three kinds of subspaces. Let \(X\) and \(Y\) be a pair of given matrices. The first subspace contains all the \(m\times n\) matrices \(A\) that satisfy \(AX=O\). The second subspace contains all the \(m \times n\) matrices \(A\) that satisfy \(Y^TA = O\), while the matrices in the third subspace satisfy both \(AX =O\) and \(Y^TA = 0\).

The second part of the paper considers a subspace that contains all the symmetric matrices \(S\) that satisfy \(SX =O\). In this case, in addition to the nearest rank-\(k\) matrix we also provide the nearest rank-\(k\) positive approximant on that subspace. A further insight is gained by showing that the related cones of positive semidefinite matrices, and negative semidefinite matrices, constitute a polar decomposition of this subspace.

The paper ends with two examples of applications. The first one regards the problem of computing the nearest rank-\(k\) centered matrix, and adds new insight into the PCA of a matrix.

The second application comes from the field of Euclidean distance matrices. The new results on low-rank positive approximants are used to derive an explicit expression for the nearest source matrix. This opens a direct way for computing the related positions matrix.

**MSC.** 15A03, 15A18, 15A21, 15A42, 15A60, 65F99.

**Keywords.** Canonical subspaces, Low-rank positive approximants, The nearest rank-\(k\) centered matrix, The nearest source matrix.

\(^\ast \)Hydrological Service, P.O.B. 36118, Jerusalem 91360, Israel, e-mail: dax20@water.gov.il.

# 1 Introduction

In this paper we study matrix approximation problems that involve subspaces of matrices. Let \(X \in \mathbb {R}^{n\times p}\) and \(Y\in \mathbb {R}^{m\times q}\) be a pair of given matrices. Then the term “canonical subspace" refers to the following types of matrix subspaces.

and

Note that \(S \in \mathbb {S}\) implies \(X^TS = O\). The matrix \(O\) denotes a null matrix with appropriate dimensions.

The plan of the paper is as follows. It starts by deriving explicit solutions to matrix nearness problems of the form

where \(A \in \mathbb {R}^{m\times n}\) is a given matrix, \(\mathbb {B}\) denotes one of the subspaces, \(\mathbb {X}, \mathbb {Y}\) or \(\mathbb {Z}\), and \(\| \cdot \| \) denotes a unitarily invariant norm. In the third section we show that the SVD of a matrix \(B \in \mathbb {B}\) has a special structure. This observation paves the way for solving low-rank approximations problems of the form

where \(k\) denotes the desired matrix rank.

The second part of the paper concentrates on symmetric matrices. It starts by solving 1.5 and 1.6 when \(\mathbb {B}= \mathbb {S}\). Then we move to matrix nearness problems that seek (low-rank) positive semidefinite matrices in \(\mathbb {S}\). It is shown that the sets

are convex cones in \(\mathbb {S}\). The notation \(S \ge 0\) means that the symmetric matrix \(S\) is positive semidefinite. Similarly \(S \le 0\) means that \(S\) is negative semidefinite. With these notations at hand the problems that we solve are

and

where here \(A\) is an arbitrary matrix from \(\mathbb {R}^{n\times n}\). It is shown that these problems have closed form solutions, and if \(A\) belongs to \(\mathbb {S}\) then the derived solution remains valid in any unitarily invariant norm. The role of \(\mathbb {S}_+\) and \(\mathbb {S}_-\) is clarified by showing that this pair of convex cones constitutes a polar decomposition of \(\mathbb {S}\).

The paper ends with two examples of applications. In Section 7 the properties of canonical subspaces are used to compute the nearest rank-\(k\) centered matrix. The results add new insight into the PCA of a matrix.

In Section 8 we solve matrix nearness problems from the field of *Euclidean Distance* (ED) matrices. Given a “predistance" matrix, \(A\), it is desired to compute the nearest “source" matrix and the related “positions" matrix. As we shall see, the results on positive approximants enable us to derive closed form solutions to these problems.

# 2 The nearest matrix on a subspace

We shall start by deriving equivalent presentations to the subspaces \(\mathbb {X}\), \(\mathbb {Y}\), and \(\mathbb {Z}\), which are defined in 1.1–1.3. Using a SVD of \(X\), or a QR factorization, it is possible to compute a matrix with orthonormal columns, \(\hat X\), that has the following property: A vector \({\boldsymbol {v}}\in \mathbb {R}^n\) satisfies \({\boldsymbol {v}}^TX = {\boldsymbol {o}}\) if and only if \({\boldsymbol {v}}^T\hat X = {\boldsymbol {o}}\). The number of columns in \(\hat X\) equals rank\((X)\). Yet, for the sake of simplicity, it is possible to assume that rank\((X) = p\). Replacing \(X\) with \(\hat X\) turns 1.1 into the form

Similar arguments allow us to replace \(Y\) with a matrix \(\hat Y \in \mathbb {R}^{m\times q}\) that has orthonormal columns. This turns 1.2 and 1.3 into the forms

and

respectively.

Another equivalent presentation is derived in the following way. Let the \(n\times (n-p)\) matrix \(\tilde X\) be obtained by completing the columns of \(\hat X\) to be an orthonormal basis of \(\mathbb {R}^n\). Then the \(n\times n\) matrix \([\tilde X, \hat X]\) satisfies

Observe that the first equality in 2.4 means

while the second equality implies

and

(The dimensions of the unit matrix, \(I\), and the null matrix, \(O\), depend on the context.) Now let \(B\) be some matrix in \(\mathbb {X}\). Then from 2.6 we see that

where \(R = B\tilde X\in \mathbb {R}^{m \times (n-p)}\). Conversely, given a matrix \(R \in \mathbb {R}^{m \times (n-p)}\) then the equality \(\tilde X^T \hat X = O\) implies that the matrix \(R\tilde X^T\) belongs to \(\mathbb {X}\). This enables us to rewrite \(\mathbb {X}\) in the form

That is, \(\mathbb {X}\) is essentially a subspace which contains all the \(m\times n\) matrices whose rows belong to \(\hbox{\rm Range}(\tilde X)\).

Similarly, let the \(m \times (n-q)\) matrix \(\tilde Y\) be obtained by completing the columns of \(\hat Y\) to be an orthonormal basis of \(\mathbb {R}^m\). Then the subspaces \(\mathbb {Y}\) and \(\mathbb {Z}\) have equivalent presentations in the forms

and

Let \(\| \cdot \| \) denote any unitarily invariant norm on \(\mathbb {R}^{m \times n}\). To find the nearest matrix on \(\mathbb {X}\), or \(\mathbb {Y}\), we need the following observation.

Let \(H\in \mathbb {R}^{m \times n}\) be a given matrix, and let the matrix \(H_\ell \in \mathbb {R}^{m \times n}\) be obtained from \(H\) by replacing the last \(n-\ell \) columns of \(H\) with zero columns. Then

Let \(A\) be a given matrix in \(\mathbb {R}^{m \times n}\) and consider the problem

where \(\| \cdot \| \) is a unitarily invariant norm on \(\mathbb {R}^{m \times n}\). Then the matrix

solves this problem.

where the last equality is due to lemma 2.1. Consequently the choice \(R = A\tilde X\) gives the minimal value.

The proof of the next theorem is derived in a similar way.

Let \(A\) and \(\| \cdot \| \) be as above and consider the problem

Then the matrix

solves this problem.

The structure of \(\mathbb {Z}\) is slightly more complicated and, therefore, the nearest matrix is computed with respect to one norm, the Frobenius matrix norm \(\| \cdot \| _F\). (The reasons are explained after the proof of theorem 2.4.) For the sake of clarity we mention that for any matrix \(A = (a_{ij}) \in \mathbb {R}^{m \times n}\) the Frobenius norm of \(A\) is defined as

and

Let \(A\) be a given matrix in \(\mathbb {R}^{m \times n}\) and consider the problem

Then the matrix

solves this problem.

Then, since the Frobenius norm is unitarily invariant,

Now comparing the matrices

and

shows that the optimal choice of \(R\) is

It is tempting to think that the choice 2.19 is also optimal for any other unitarily invariant norm. However, as the following example shows, setting to zero the north-west corner of a matrix does not necessarily reduce the matrix norm. To see this point we consider the matrices

Then the trace norm of \(G\) is 2 while the trace norm of \(H\) is \(\sqrt{5}\). That is, “punching" \(G\) increases its trace norm. (Recall that the trace norm of a matrix is the sum of its singular values.) â–¡

We shall finish this section, by showing that the nearest matrices that we have found are essentially orthogonal projections. Let \(A = (a_{ij})\) and \(B = (b_{ij})\) be two matrices in \(\mathbb {R}^{m\times n}\). Then it is well known that

is an inner product on \(\mathbb {R}^{m \times n}\), and the related matrix norm is the Frobenius norm

Recall also that a matrix \(A\) is “orthogonal" to \(B\) (and vice versa) if

The inner product 2.20 and the Frobenius norm turn \(\mathbb {R}^{m \times n}\) to be a Hilbert space, in which the solutions that we have found are called “orthogonal projections". The next lemma clarifies this feature. It is a general property that holds for any subspace of a Hilbert space.

Let \(\mathbb {B}\) denote one of the subspaces \(\mathbb {X}, \mathbb {Y}\), or \(\mathbb {Z}\). Let \(A\) be an arbitrary matrix in \(\mathbb {R}^{m\times n}\). Then there exists a unique matrix \(\hat B \in \mathbb {B}\) that solves the problem

Moreover, the matrix \(A - \hat B\) is orthogonal to any matrix \(B \in \mathbb {B}\). That is,

It is instructive, however, to see why 2.24 holds on the specific subspaces \(\mathbb {X}, \mathbb {Y}\) and \(\mathbb {Z}\). Let us consider for example the subspace \(\mathbb {X}\). Then here 2.24 is reduced to

or

Recall that a rank-one matrix in \(\mathbb {R}^{m\times n}\) has the form \({\boldsymbol {u}}{\boldsymbol {v}}^T\in \mathbb {R}^{m \times n}\), where \({\boldsymbol {u}}\in \mathbb {R}^m\) and \({\boldsymbol {v}}\in \mathbb {R}^n\). Now it is easy to verify that the inner product of a rank-one matrix satisfies

Note also that

where \({\boldsymbol {r}}_j\) and \(\tilde{\boldsymbol {x}}_j\) denote the \(j\)th columns of \(R\) and \(\tilde X\). Combining these relations gives

where the last equality follows from the orthogonality relation \(\hat X^T \tilde X = O\). The verification of 2.23 for \(\mathbb {Y}\) and \(\mathbb {Z}\) is done in a similar way. The next result is another well known property of orthogonal projections in Hilbert spaces.

Let \(\mathbb {B}\), \(A\), and \(\hat B\) be as in lemma 2.5. Then for any matrix \(H \in \mathbb {B}\) we have the equality

In the next section we will use this observation to compute the nearest rank-\(k\) matrix on \(\mathbb {B}\).

# 3 Low-rank approximations over canonical subspaces

In this section we solve low-rank approximations problems of the form 1.6. Using 2.26 it is possible to rewrite 1.6 in the form

The key for solving this problem lies in the following observations.

Let \(B\) be a given matrix in \(\mathbb {R}^{m \times n}\) and let \(r\) denote the rank of \(B\). Then it is well known that \(B\) has a “compact" SVD of the form

where the matrices \(U\) and \(V\) have \(r\) orthonormal columns and

is a diagonal \(r\times r\) matrix. The diagonal entries of \(\Sigma \) are the nonzero singular values of \(B\). These entries are assumed to be positive and sorted to satisfy

Assume that \(B \in \mathbb {B}\). In this case the matrices \(U\) and \(V\) satisfy the following conditions:

a) If \(B \in \mathbb {X}\) then \(V^T\hat X = 0\).

b) If \(B \in \mathbb {Y}\) then \(\hat Y^T U = 0\).

c) If \(B \in \mathbb {Z}\) then \(\hat Y^T U = 0\) and \(V^T\hat X = 0\).

Let \({\boldsymbol {w}}_j \in \mathbb {R}^r\) denote the \(j\)th column of the matrix \(V^T\hat X\). Then 3.5 implies that

Therefore, since \(U\Sigma \) is an \(m\times r\) matrix that has full column rank, the linear system 3.6 has a unique solution \({\boldsymbol {w}}_j = {\boldsymbol {o}}\). This proves the first claim. The other claims are proved in similar ways.

An equivalent way to write 3.2 is

where \({\boldsymbol {u}}_j \) and \({\boldsymbol {v}}_j\) denote the \(j\)th columns of \(U\) and \(V\), respectively. Let \(k\) be a positive integer that is smaller than \(r\), and let the matrices \(U_k \in \mathbb {R}^{m\times k}\) and \(V_k\in \mathbb {R}^{n\times k}\) be composed from the first \(k\) columns of \(U\) and \(V\), respectively. Similarly, \(\Sigma _k = \hbox{\rm diag}\{ \sigma _1, \sigma _2,\dots , \sigma _k\} \) denote the related \(k\times k\) diagonal matrix. Then the matrix

is called a rank-\(k\) truncated SVD of \(B\). The importance of this matrix lies in the following observations. As before \(\| \cdot \| \) denote a unitarily invariant norm on \(\mathbb {R}^{m \times n}\), \(A\) is some matrix in \(\mathbb {R}^{m\times n}\), and \(T_k(A)\) denotes a rank-\(k\) truncated SVD of \(A\). Then \(T_k(A)\) solves the least norm problem

This is the well known Eckart-Young-Mirsky theorem [ 14 , 26 ] . For recent discussion of this observation see [ 12 ] . The next theorem sharpens this result by forcing \(A\) and \(H\) to stay in \(\mathbb {B}\).

Assume that \(B \in \mathbb {B}\). In this case \(T_k(B) \in \mathbb {B}\). Consequently \(T_k(B)\) solves the least norm problem

Let \(A, \mathbb {B}\), and \(\hat B\) be as in lemma 2.5. Then the matrix \(T_k(\hat B)\) solves the least norm problem

which leads to the following conclusions:

a) The matrix \(T_k(A\tilde X \tilde X^T)\) solves the problem

b) The matrix \(T_k(\tilde Y \tilde Y^TA)\) solves the problem

c) The matrix \(T_k(\tilde Y \tilde Y^TA \tilde X \tilde X^T)\) solves the problem

# 4 Symmetric matrices over a subspace

In this section we turn to consider matrix approximations over the subspace \(\mathbb {S}\), which is defined in (1.4). As before, there is no loss of generality in replacing \(X\) with an \(n\times p\) orthonormal matrix, \(\hat X\), such that \(\hbox{\rm Range}(\hat X) = \hbox{\rm Range}(X)\). This convention enables us to present \(\mathbb {S}\) in the form:

where \(\mathbb {S}^n\) denotes the set of all real symmetric matrices of order \(n\). That is:

Let the \(n\times (n-p)\) matrix \(\tilde X\) be obtained from \(\hat X\) as in Section 2, satisfying 2.4–2.7. Then, as we have seen, \(\mathbb {S}\) has equivalent presentation in the form

Moreover, following the proofs of theorem 2.4 and lemma 2.5 we obtain the following results.

Let \(S\) be some matrix in \(\mathbb {S}^n\) and consider the problem

Then the matrix

solves this problem. In other words, \(\hat S\) is the orthogonal projection of \(S\) onto \(\mathbb {S}\).

The matrix \(S - \hat S\) is orthogonal to any matrix \(H\in \mathbb {S}\). That is,

Consequently the equality

holds for any matrix \(H \in \mathbb {S}\).

The last equality enables us to compute low-rank approximations over \(\mathbb {S}\). Following the proofs of ?? we obtain analogous results for \(\mathbb {S}\).

Let \(S\) and \(\hat S\) be as above, and let \(T_k(\hat S)\) be a rank-\(k\) truncated SVD of \(\hat S\). Then \(T_k(\hat S) \in \mathbb {S}\) and this matrix solves the least norm problem

The results of this section can be extended by replacing \(S\) with an arbitrary matrix \(A\in \mathbb {R}^{n\times n}\). In this case problem 4.4 takes the form

where \(A\) is a given matrix in \(\mathbb {R}^{n\times n}\). The solution of the last problem is based on the following well known observations. Recall that \(A\) has a unique cartesian decomposition of the form \(A = S+T\) where \(S = (A+A^T)/2\) is symmetric and \(T = (A - A^T)/2\) is skew-symmetric. It is also easy to verify that the equality

holds for any sum of a symmetric matrix, \(S\), plus a skew-symmetric matrix, \(T\). Hence for any symmetric matrix, \(H \in \mathbb {S}^n\), we have the equality

Therefore a solution for 4.4 provides a solution for 4.9.

# 5 Low-rank positive approximants

In this section we derive explicit solutions for problems 1.9 and 1.10. We start by introducing the tools for solving these problems. Let \(\hat S \in \mathbb {S}^n\) be a given symmetric matrix, and let \(r\) denote its rank. Then \(\hat S\) has a “compact" spectral decomposition of the form

where \(V \in \mathbb {R}^{n\times r}\) has orthonormal columns, and

is a diagonal matrix. The diagonal entries of \(D\) are the non-zero eigenvalues of \(\hat S.\) It is assumed for simplicity that these eigenvalues are sorted in decreasing order. That is,

In addition to \(r\) we use a non-negative integer, \(\ell \), that counts the number of positive eigenvalues of \(\hat S\). The definition of \(\ell \) implies that \(0 \le \ell \le r\), and if \(1 \le \ell {\lt} r\) then the non-zero eigenvalues satisfy

Moreover, if \(1 {\lt} \ell {\lt} r \) then 5.2 and 5.3 imply

Recall also that 5.1 can be rewritten in the form

where \({\boldsymbol {v}}_1,\dots , {\boldsymbol {v}}_r\), are the columns of \(V\). These notations enable us to split between the “positive" part of \(\hat S\) and its “negative" part. Let the matrix

denote the positive definite part of \(\hat S\), and let the matrix

denote its negative definite part. Then, clearly,

This decomposition is sometimes called the Jordan decomposition of \(\hat S\). (If \(\ell = 0\) then \(P (\hat S) = O\) and \(N(\hat S) = \hat S\). Similarly, if \(\ell = r \) then \(P(\hat S) = \hat S\) and \(N(\hat S) = O\).) Another useful matrix operator is

where \(k\) is a positive integer and \(\nu = \operatorname {min}\{ k, \ell \} \). (If \(\ell = 0\) then \(P_k (\hat S) = O\).) Note also that

where, as before, \(T_k(\cdot )\) is the rank-\(k\) truncated SVD operator. The importance of \(P(\hat S)\) and \(P_k(\hat S)\) emerges from the following observations. As before, \(\| \cdot \| \) denotes a unitarily invariant norm on \(\mathbb {R}^{n\times n}\), and \(H\ge 0 \) means that \(H\) is a symmetric positive semidefinite matrix.

Let \(\hat S\) be some matrix in \(\mathbb {S}^n\). Then \(P(\hat S)\) solves the problem

Let \(\hat S\) be some matrix in \(\mathbb {S}^n\). Then \(P_k(\hat S)\) solves the problem

A matrix that solves 5.11 is called “**positive approximant**", *e.g.*,
[
7
,
13
,
19
,
21
,
22
,
28
]
. The current interest in this problem was initiated by Halmos’ paper
[
19
]
, which considers the solution of 5.11 in the spectral norm. Rogers and Ward
[
28
]
consider the Schatten \(p\)-norm, Ando
[
2
]
considers the trace norm, and Higham
[
21
]
solves 5.11 in the Frobenius norm. Later the solution was extended to any unitarily invariant norm by Bhatia and Kittaneh
[
5
]
. See
[
4
,
p.
277
]
and
[
13
]
for alternative proofs. Several results on this topic were obtained in the context of linear operators on a Hilbert space, *e.g.*,
[
4
,
5
,
7
,
19
,
28
]
. The term “**low-rank positive approximants**" refers to matrices that solve 5.12. The last problem was first solved by Mathar
[
25
]
for Schatten-\(p\) norms, and recently by Dax
[
13
]
for every unitarily invariant norm. The next assertion enables us to apply these results for solving 1.9 and 1.10.

Let the matrix \(\hat S \in \mathbb {S}^n\) have a compact spectral decomposition of the form 5.1. If \(\hat S \in \mathbb {S}\) then

In other words, let \({\boldsymbol {v}}\) be an eigenvector of \(\hat S\) that corresponds to a non-zero eigenvalue, then \({\boldsymbol {v}}^T\hat X = {\boldsymbol {o}}\).

Let \({\boldsymbol {w}}_j, \; j = 1, \dots , p\), denote the \(j\)th column of the \(r\times p\) matrix \(W = V^T\hat X\). Then an equivalent way to write 5.14 is

Therefore, since the matrix VD has full column rank,

Theorem 5.3 allows the use of ?? on \(\mathbb {S}\). This gives the following stronger results.

If \(\hat S \in \mathbb {S}\) then \(P(\hat S) \in \mathbb {S}_+\), where \(\mathbb {S}_+\) is defined in 1.7. Consequently \(P(\hat S)\) solves the least norm problem

If \(\hat S \in \mathbb {S}\) then \(P_k (\hat S) \in \mathbb {S}_+\). Consequently \(P_k(\hat S)\) solves the least norm problem

When using the Frobenius norm it is possible to extend these results to any matrix \(A \in \mathbb {R}^{n\times n}\).

Let \(A\) be some matrix in \(\mathbb {R}^{n \times n}\). Define \(S = (A + A^T)/2\) and let \(\hat S\) be obtained from \(S\) by the rule 4.5. Then \(P (\hat S)\) solves the problem

while \(P_k(\hat S)\) solves the problem

# 6 Polar cones and polar decompositions

Another feature that distinguishes \(\mathbb {S}\) is that the subsets \(\mathbb {S}_+\) and \(\mathbb {S}_-\) constitute a polar decomposition of \(\mathbb {S}\). To clarify this statement we give a brief overview of the necessary background.

Let \(\mathbb {H}\) be a real Hilbert space with a scalar product, \(\langle {\boldsymbol {u}}, {\boldsymbol {v}}\rangle \), and related norm \(\| {\boldsymbol {u}}\| = (\langle {\boldsymbol {u}}, {\boldsymbol {u}}\rangle )^{1/2}\). Recall that a subset \(\mathbb {K}\) of \(\mathbb {H}\) is called “convex cone" if it has the following property: Let \({\boldsymbol {u}}\) and \({\boldsymbol {v}}\) be two points in \(\mathbb {K}\) and let \(\alpha \) and \(\beta \) be two nonnegative scalars. Then the point \(\alpha {\boldsymbol {u}}+ \beta {\boldsymbol {v}}\) belongs to \(\mathbb {K}\). Given a convex cone, \(\mathbb {K}\), the set

is called the polar cone of \(\mathbb {K}\). It is well known that \(\mathbb {K}^*\) is a closed convex cone, and that \((\mathbb {K}^*)^*\) is the closure of \(\mathbb {K}\), *e.g.*,
[
3
]
. Thus, if \(\mathbb {K}\) is a closed convex cone then \((\mathbb {K}^*)^* = \mathbb {K}\). An important feature that characterizes polar cones is **Moreau’s Polar Decomposition**
[
27
]
: Let \(\mathbb {K}\) be a closed convex cone in \(\mathbb {H}\) and let \(\mathbb {K}^*\) be its polar cone. Then any vector \({\boldsymbol {w}}\in \mathbb {H}\) has a unique polar decomposition of the form

Moreover, \({\boldsymbol {u}}\) is the projection of \({\boldsymbol {w}}\) onto \(\mathbb {K}\), and \({\boldsymbol {v}}\) is the projection of \({\boldsymbol {w}}\) onto \(\mathbb {K}^*\). (The term “projection" means that these vectors solve the related least norm problems.) The above relations are often summarized by saying that \(\mathbb {K}\) and \(\mathbb {K}^*\) constitute a polar decomposition on \(\mathbb {H}\).

Below we will show that \((\mathbb {S}_+)^* = \mathbb {S}_-\) and \((\mathbb {S}_-)^* = \mathbb {S}_+\). For this purpose we prove the following lemma.

Let \(\mathbb {H}, \mathbb {K}\) and \(\mathbb {K}^*\) be as above, and let the convex cone \(\tilde\mathbb {K}\) be contained in \(\mathbb {K}^*\). That is, \(\langle {\boldsymbol {u}}, {\boldsymbol {v}}\rangle \le 0\) whenever \({\boldsymbol {u}}\in \mathbb {K}\) and \({\boldsymbol {v}}\in \tilde\mathbb {K}\). If any vector \({\boldsymbol {w}}\in \mathbb {H}\) satisfies

then \(\tilde\mathbb {K}=\mathbb {K}^*\).

Let us return now to consider the sets \(\mathbb {S}_+ \) and \(\mathbb {S}_-\). It is easy to verify that these sets are convex cones in \(\mathbb {S}\). Moreover, as we now show, these cones constitute a polar decomposition of \(\mathbb {S}\). The tools for proving this observation are the matrix operators \(P(\hat S)\) and \(N(\hat S)\) which were introduced in 5.6 and 5.7.

Every matrix \(\hat S \in \mathbb {S}\) has a unique polar decomposition of the form:

and

Furthermore, \(\mathbb {S}_-\) is the polar cone of \(\mathbb {S}_+\) and \(\mathbb {S}_+\) is the polar cone of \(\mathbb {S}_-\).

where the last equality holds since \({\boldsymbol {v}}^T_i{\boldsymbol {v}}_j = 0\) whenever \(\lambda _i \neq \lambda _j\).

Next we show that \(\mathbb {S}_-\) is contained in \((\mathbb {S}_+)^*\), and that \(\mathbb {S}_+\) is contained in \((\mathbb {S}_-)^*\). Let \(H_+\) be any matrix in \(\mathbb {S}_+\), and let \(H_-\) be any matrix in \(\mathbb {S}_-\). Then it is sufficient to show that

Let \(\eta _1, \dots , \eta _t\), denote the nonzero eigenvalues of \(H_-\), and let \({\boldsymbol {h}}_1, \dots , {\boldsymbol {h}}_t\), denote the corresponding eigenvectors of \(H_-\). That is:

where

From this presentation we see that:

where the last inequality is concluded from 6.9 and the fact that \(H_+\) is a positive semidefinite matrix. Now lemma 6.1 implies that \(\mathbb {S}_-\) is the polar cone of \(\mathbb {S}_+\), and that \(\mathbb {S}_+ \) is the polar cone of \(\mathbb {S}_-\).

# 7 Principal Component Analysis (PCA) and the nearest rank-\({\protect \lowercase {k}}\) centered matrix

Let \(A \in \mathbb {R}^{m\times n}, \; m \ge n\), be a given data matrix. In this section we derive new observations about the principal component analysis (PCA) of \(A\). The first step in constructing the PCA is to “center" the columns of \(A\). Hence we shall start by explaining this concept and showing how to compute a rank-\(k\) centered matrix that is nearest to \(A\).

A matrix \(B \in \mathbb {R}^{m\times n}\) is called **row-centered** if \(B\hat{{\boldsymbol {e}}} = {\boldsymbol {o}}\) where \(\hat{\boldsymbol {e}}= (1,\dots ,1)^T \) \(\in \mathbb {R}^n\). Similarly \(B\) is called **column-centered** if \(\tilde{\boldsymbol {e}}^T B = {\boldsymbol {o}}\) where \(\tilde{\boldsymbol {e}}= (1,1,\dots , 1)^T \in \mathbb {R}^m\). If \(B\) satisfies both \(B \hat{\boldsymbol {e}}= {\boldsymbol {o}}\) and \(\tilde{\boldsymbol {e}}^T B = 0\) then it is called **doubly-centered**. Observe that the corresponding canonical subspaces have the following forms.

Now from theorem 3.1 we obtain the following somewhat surprising results.

Let the matrix \(B \in \mathbb {R}^{m\times n}\) have the compact SVD 3.2–3.4.

If \(B \in \mathbb {X}\) then \(V^T \hat{\boldsymbol {e}}= {\boldsymbol {o}}\). That is, the right singular vectors are centered.

If \(B \in \mathbb {Y}\) then \(\tilde{\boldsymbol {e}}^T U = {\boldsymbol {o}}\). That is, the left singular vectors are centered.

If \(B \in \mathbb {Z}\) then \(\tilde{\boldsymbol {e}}^T U = {\boldsymbol {o}}\) and \(V^T\hat{\boldsymbol {e}}= {\boldsymbol {o}}\). In other words, in this case both the left singular vectors and the right singular are centered.

The centering of a matrix, or vector is done by applying the following matrices.

With these notations at hand it is easy to verify that the matrix \(\tilde C A\) is column-centered. That is, the mean of each column equals zero. Similarly, the matrix \(A\hat C\) is row-centered and the mean of each row equals zero. Now from ?? we obtain the following conclusions.

The matrix \(A\hat C\) is an orthogonal projection of \(A\) on \(\mathbb {X}\). The matrix \(\tilde CA\) is an orthogonal projection of \(A\) on \(\mathbb {Y}\). The matrix \(\tilde C A \hat C\) is an orthogonal projection of \(A\) on \(\mathbb {Z}\). Moreover, let \(B\) be some matrix in \(\mathbb {Y}\), then equality 2.26 gives

(Similar equalities hold in \(\mathbb {X}\) and \(\mathbb {Z}\).)

The norm equalities that we obtained enable us to derive the following conclusions.

**The nearest rank-\(k\) centered matrix**

Let \(T_k(B)\) denote the truncated SVD operator 3.8.

The matrix \(T_k(A \hat{C})\) solves 3.12 and the right singular vectors of this matrix are centered.

The matrix \(T_k(\tilde CA)\) solves 3.13 and the left singular vectors of this matrix are centered.

The matrix \(T_k(\tilde CA\hat C)\) solves 3.14 and the singular vectors of this matrix are centered.

Let us turn now to inspect the effect of these results on the PCA of A. For detailed description of the PCA and its properties, see [ 1 ] and the references therein. As noted in [ 1 ] , usually the data matrix, \(A\), is pre-processed before the analysis. This is done by centering the columns of \(A\). That is, \(A\) is replaced by \(\tilde CA\). Then the SVD of \(\tilde C A\) is computed. In other words, the PCA of A is the SVD of \(\tilde CA\). Now ?? lead to the following conclusions.

The matrix \(\tilde CA\) is the orthogonal projection of \(A\) on the subspace

Moreover, let \(r\) denote the rank of \(\tilde CA\). Then for each \(k,\; k = 1, \dots , r\), the matrix \(T_k(\tilde C A)\) solves the problem

and the left singular vectors of \(T_k (\tilde CA) \) are centered.

The last theorem brings two innovations about the PCA. It is well known that \(T_k (\tilde C A)\) is a rank-\(k\) matrix that is nearest to \(\tilde CA\) in any unitarily invariant norm. The results of theorem 7.4 extend this observation in the following way. Here \(T_k(\tilde C A)\) is nearest to \(A\) when using the Frobenius matrix norm. The second innovation is the fact that the left singular vectors of the matrices \(T_k(\tilde CA)\) are centered.

Another way to carry out PCA is called **covariance PCA**. In this case one computes the SVD of the matrix \(\tilde CA/\sqrt{m}\) (or \(\tilde CA/\sqrt{m-1}\)). The name comes from the fact that the matrix \((\tilde CA/\sqrt{m})^T(\tilde CA/\sqrt{m})\) is a covariance matrix. Note that theorem 7.4 is easily modified to include this case.

A third way to compute PCA is called **correlation PCA**. In this version the columns of the centered matrix \(\tilde CA\) are normalized to have unit length. Let the columns of \(\tilde CA\) be denoted as \({\hbox{\bf c}}_j,\; j = 1,\dots , n\), and let the diagonal matrix

be defined by the equalities \(d_j = \| {\hbox{\bf c}}_j\| _2, \; j = 1,\dots , n\). Then here one computes the SVD of the matrix \(\tilde C A D^{-1}\). The name comes from the fact that \((\tilde C A D^{-1})^T (\tilde C A D^{-1})\) is a correlation matrix. As before, since \(\tilde C A D^{-1}\) is a column-centered matrix, the left singular vectors of this matrix are also centered.

# 8 Euclidean Distance matrices: The nearest rank-\({\protect \lowercase {k}}\) source matrix

In this section we briefly describe an application that arises in Euclidean Distance (ED) matrices. A matrix \(A = (a_{ij}) \in \mathbb {R}^{n\times n}\) is said to be a \(k\)-dimensional Euclidean distance (ED) matrix if there exist \(n\) points in \(\mathbb {R}^k\), say \({\boldsymbol {x}}_1, {\boldsymbol {x}}_2, \dots , {\boldsymbol {x}}_n\), such that

Let \(X = [{\boldsymbol {x}}_1,\dots , {\boldsymbol {x}}_n] \in \mathbb {R}^{k \times n}\) denote the related **positions matrix**, and let the column vector \({\boldsymbol {g}}\in \mathbb {R}^n\) be obtained from the diagonal entries of the matrix \(X^T X\). That is,

Let the matrix operator \(G(X)\) be defined as

where

Then 8.1 can be rewritten as

Note that for any orthonormal matrix \(Q \in \mathbb {R}^{k\times k}\), the matrices \(X\) and \(QX\) generate the same ED matrix, and \(\| X \| _F = \| Q X\| _F\). The Frobenius norm of the positions matrix can be reduced by considering the least squares problem

which has a unique minimizer at the centroid point

The replacement of \({\boldsymbol {x}}_j\) with \({\boldsymbol {x}}_j - \hat{\boldsymbol {u}}\) is carried out by introducing the centering matrix

which shifts the origin of \(\mathbb {R}^k\) to the centroid point. The matrix \(XC\) is row-centered and \(\| XC \| ^2_F \le \| X\| ^2_F\). Furthermore, since shifting the origin does not change Euclidean distances, the matrices \(X\) and \(XC\) generate the same ED matrix. Hence there is no loss of generality in assuming that the position matrix is row-centered. That is,

Note also that any other matrix, \(\hat X\), that satisfies \(\hat X^T \hat X = X^TX\) generates the same ED matrix as \(X\).

The use of ED matrices occurs in various applications. One example comes from multidimensional scaling problems in psychometrics and statistics. Here the matrix entries represent similarities (or dissimilarities) between objects and we want to produce geometric representation of these objects, *e.g.*,
[
6
,
9
]
. A second example comes from computational chemistry, in which it is desired to determine the structure of a molecule (“molecule conformation") from information about interatomic distances, *e.g.*,
[
8
,
10
,
11
,
17
,
18
]
. Other important applications arise in the fields of sensor network localization and distance geometry, *e.g.*,
[
11
,
23
,
24
]
. In these applications the ED matrix, \(A\), is obtained from empirical observations, but the position matrix is not known. Hence it is desired to compute a position matrix, \(X\), that “fits" the observed ED matrix. Note also that in many cases \(k\) is known in advance.

If \(A\) is an “exact" \(k\)-dimensional ED matrix then \(X\) can be obtained in the following way. From 8.3 we see that:

which means that the symmetric matrix:

is positive semidefinite, and rank\((S)\le k\). Consequently it is possible to compute a square root of \(S\), say \(X\), such that:

\(X \in \mathbb {R}^{k\times n}\), and \(X = XC\). That is, \(X\) is a centered position matrix of \(A\).

The properties of \(S\) can be summarized by saying that it is the source matrix of \(A\). More precisely, a matrix \(S \) that satisfies:

is called a rank-\(k\) **source matrix**. As we have seen, any matrix of this kind, \(S\), can be used to generate a \(k\times n\) centered positions matrix, \(X\), and a related ED matrix, \(A = G(X)\), such that:

Recall, however, that in practice the entries of \(A\) are obtained from empirical observations, such as physical measurements. Consequently these entries may contain some error, and \(A\) may differ from an “exact" ED matrix. In this case the matrix \(C (-A/2)C\) may fail to be a rank-\(k\) source matrix, and the recovering of \(X\) needs some amendments.

The problem that we have to solve is, therefore, the following. Given an erroneous ED matrix, \(A\), compute a positions matrix, \(X\), that “fits" \(A\) in a “reasonable" way. Usually the solution process starts by replacing \(A\) with the nearest predistance matrix, \(\hat A\). Recall that a **predistance matrix** is a hollow symmetric matrix with nonnegative entries. The term “hollow" refers to matrices with zero diagonal entries. The converting of \(A\) into a predistance matrix starts by setting to zero all the negative entries and all the diagonal entries. Then \(A\) is replaced by \((A^T + A)/2\). Now it is easy to verify that the resulting predistance matrix, \(\hat A\), is the nearest to \(A\) with respect to Frobenius norm. Yet it is still possible that \(\hat A\) is not an ED matrix, which means that the matrix \(C(-\hat A/2)C\) is not a rank-\(k\) source matrix.

A common way to compute a positions matrix \(X\) that “fits" \(\hat A\) is by solving the problem

The last problem has no explicit solution, and usually it is solved by applying some iterative method, *e.g.*,
[
8
,
17
,
24
]
. In this section we propose a different approach. Let \(S\) be a rank-\(k\) source matrix that is nearest to \(C(-\hat A/2)C\) with respect to Frobenius norm. Then \(X\) is defined to be the square roof of \(S\). That is, a matrix that satisfies \(X^T X = S\). The derivation of \(S\) and the motivation behind the proposed solution are explained below.

Let \(S \in \mathbb {S}^n\) be a given symmetric matrix. Than it is easy to verify that the following three equalities, \(S = CSC\), \(S{\boldsymbol {e}}= {\boldsymbol {o}}\), and \({\boldsymbol {e}}^T S = {\boldsymbol {o}}^T\), are equivalent in the sense that one equality implies the others. These relations show that the subspace

equals the canonical subspace

Note also that the matrix \(C(-\hat A/2)C\) is the orthogonal projection of \(-\hat A/2\) onto \(\mathbb {S}_{\boldsymbol {e}}\). Similarly, using the results of Section 5 with \(\mathbb {S}= \mathbb {S}_e\) shows that \(P_k (C(-\hat A/2)C)\) is a rank-\(k\) source matrix which is nearest to \(C(-\hat A/2)C\) with regards to any unitarily invariant norm. Moreover, when using the Frobenius norm we obtain that \(P_k(C(-\hat A/2)C)\) is a rank-\(k\) source matrix that is nearest to \(- \hat A/2\).

The motivation behind the proposed solution is clarified by considering the following two cases. If \(\hat A\) is an “exact" ED matrix, then the nearest rank-\(k\) source matrix is \(C(-\hat A/2)C\), and the related positions matrix, \(X\in \mathbb {R}^{k\times n}\), is a square root of this matrix. That is:

Otherwise, when \(\hat A\) is not an “exact" ED matrix, the nearest rank-\(k\) source matrix is given by \(P_k(C(-\hat A)C)\), and the related positions matrix, \(X\), is defined as the square root of this matrix. That is:

Similar semidefinite programming problems were considered in some earlier works, *e.g.*,
[
17
,
20
,
25
]
, but these methods compute the positions matrix, \(X\), in entirely different ways. The majority of the former methods compute \(X\) by solving 8.13 *via* an iterative method. In contrast, our method computes \(X\) by using a closed form solution. Recall that least squares solutions are sensitive to large errors in the data. Hence the fact that our solution is not aimed at solving 8.13 does not necessarily mean that we have an “inferior" solution.

# 9 Concluding remarks

In this paper we study four types of canonical subspaces. The first one, \(\mathbb {X}\), contains \(m\times n\) matrices whose rows belong to a certain vector space in \(\mathbb {R}^n\). The second type, \(\mathbb {Y}\), contains all the \(m\times n\) matrices whose columns belong to a certain vector space in \(\mathbb {R}^m\). The third type, \(\mathbb {Z}\), contains matrices that satisfy both conditions, while the fourth type, \(\mathbb {S}\), is a symmetric version of \(\mathbb {X}\). Equipping the subspaces with orthonormal bases enables the derivation of orthogonal projections on these subspaces. Let \(A\) be a given matrix and let \(\hat A\) denote the orthogonal projection of \(A\) on one of these subspaces. That is, \(\hat A\) is nearest to \(A\) with regard to Frobenius norm. The subspaces \(\mathbb {X}\) and \(\mathbb {Y}\) have the unique property that \(\hat A\) is also nearest to \(A\) with respect to every other unitarily invariant norm. Yet, as we have seen, orthogonal projections on \(\mathbb {Z}\) and \(\mathbb {S}\) don’t share this feature.

The ability to obtain low-rank approximations of a matrix \(A \in \mathbb {R}^{m\times n}\) on canonical subspaces is based on three key observations. The first is an explicit expression for the orthogonal projection of \(A\) onto the canonical subspace. The second observation is about the singular vectors of a matrix \(B\) that belongs to a certain canonical subspace. Third, the observation that a low-rank approximation of \(B\) belongs to the same canonical subspace as \(B\). It is the combination of these three features that enables us to derive the desired low-rank approximations.

The derivation of rank-\(k\) positive approximants is based on the matrix operator \(P_k(\cdot )\). Let \(S\) be some matrix in \(\mathbb {S}^n\) and let \(\hat S\) denote the orthogonal projection of \(S\) onto \(\mathbb {S}\). Then, as we have seen, \(P_k(\hat S)\) belongs to \(\mathbb {S}\), and \(P_k(\hat S)\) is a rank-\(k\) positive approximant of \(\hat S\) with regard to every unitarily invariant norm. Moreover, when using the Frobenius norm we obtain that \(P_k (\hat S)\) is a rank-\(k\) positive approximant of \(S\) on the subspace \(\mathbb {S}\).

The study of symmetric canonical subspaces reveals a rich geometry which resembles that of \(\mathbb {S}^n\). The computation of positive approximants in \(\mathbb {S}^n\) is closely related to orthogonal projections on the cone of positive semidefinite matrices in \(\mathbb {S}^n\). Similarly, the computation of positive approximants in \(\mathbb {S}\) is equivalent to orthogonal projections on the cone of positive semidefinite matrices in \(\mathbb {S}\). The picture is completed by establishing the related polar decomposition

on \(\mathbb {S}\).

The applications described in the last sections illustrate the usefulness of the new results. Let \(A\) be a given data matrix. The first example derives explicit expression for a rank-\(k\) centered matrix that is nearest to \(A\). The properties of this matrix illuminate the PCA of \(A\) in a new light.

The second example comes from the field of Euclidean distance matrices. The fact that we have an explicit formula for \(P_k(\hat S)\) enables us to derive an explicit formula for the nearest source matrix. This provides an effective way for calculating the positions matrix of an erroneous Euclidean distance matrix.

# Bibliography

- 1
- 2
- 3
*D.P. Bertsekas*,*Convex optimization theory*, Athena Scientific, 2009.- 4
*R. Bhatia*,*Matrix Analysis*, Springer, New York, 1997.- 5
- 6
*I. Borg, P.J.F. Groenen*,*Modern multidimensional scaling: Theory and applications*(2nd ed.), Springer Series in Statistics, Springer, 2005.- 7
*R. Bouldin*,*Positive aproximants*, Trans. Amer. Math. Soc.,**177**(1973), pp. 391–403.- 8
*D.I. Chu, H.C. Brown, M.T. Chu*,*On least squares Euclidean distance matrix approximation and completion*, Dept. of Mathematics, North Carolina State University, Tech. Rep., 2003.- 9
*T.F. Cox, M.A.A. Cox*,*Multidimensional Scaling*, 2nd Ed., Chapman and Hall/CRC, 2001.- 10
*G. Crippen, T. Havel*,*Distance Geometry and Molecular Conformation*, Wiley, New York, 1988.- 11
*J. Dattorro*,*Convex optimization and Euclidean distance geometry*, Meboo Publishing, USA, 2005.- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
*P.R. Halmos*,*Positive approximants of operators*, Indiana Univ. Math. J.,**21**(1972), pp. 951–960.- 20
- 21
- 22
*N.J. Higham*,*Matrix nearness problems and applications*, in*M.J.C. Gover and S. Barnett, editors*, Applications of Matrix Theory, pp. 1–27. Oxford University Press, 1989.- 23
- 24
*N. Krislock, H. Wolkowicz*,*Euclidean distance matrices and applications*, In*Handbook of Semidefinite, Cone and Polynomial Optimization*, M. Anjos and J. Lasserre (Eds.), 2010.- 25
- 26
- 27
*J.J. Moreau*,*Decomposition orthogonal d’un espace hilbertien selon deux cones mutuellement polaires*, C.R. Acad. Sci. Paris,**255**(1962), pp. 238–240.- 28
*D.D. Rogers, J.D. Ward*,*\(C_p\)-minimal positive approximants*, Acta Sci. Math.,**43**(1981), pp. 109–115.