3. SVD 3

1. 키워드

Principal Component Analysis(주성분분석)
Gram Matrix(그람행렬)
Low-Rank Approximation
Dimension-Reducing Transformation

2. Eigendecomposition in Machine Learning

In machine learning, we often handle symmetric positive (semi-)definite matrix.

Given a (feature-by-data item) matrix \(A \in \mathbb{R}^{m \times n}\), \(A^{T} A\) represents a (data item-by-data item) similarity matrix between all pairs of data items, where the similarity is computed as an inner product.

Likewise, \(A A^{T}\) represents a (feature-by-feature) similarity matrix between all pairs of features, indicating a kind of correlations between features.

Covariance matrix in principal component analysis
Gram matrix in style transfer

3. Low-Rank Approximation of a Matrix

Recall a rectangular matrix \(A \in \mathbb{R}^{m \times n}\), its SVD can be represented as the sum of outer products \(A=U \Sigma V^{T}=\sum_{i=1}^{n} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{n}\).

Consider the problem of the best low-rank approximation of \(A\):

\[ {\hat{A}_{r}}=\arg \min_{A_{r}}\left\|A-A_{r}\right\|_{F} \text{ subject to } \operatorname{rank}\left(A_{r}\right) \leq r \]

The optimal solution is given as \({\hat{A}_{r}}=\sum_{i=1}^{r} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{{r}}\).

We approximate \(A\) as \(A_{r}\) by setting \(\sigma_{i}=0\) for \(\forall i \geq(r+1)\):

\[ A=\sum_{i=1}^{{n}} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T} \simeq {A_{r}}=\sum_{i=1}^{{r}} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}=U_{r} \Sigma_{r} V_{r}^{T} \]

4. Dimension-Reducing Transformation

Given a (feature-by-data item) matrix \(X \in \mathbb{R}^{m \times n}\), consider the linear transformation, \(G^{T}: \mathbf{x} \in \mathbb{R}^{m} \mapsto \mathbf{y} \in \mathbb{R}^{r}\), where \(G \in \mathbb{R}^{m \times r}\) and \(r<m\).

Can we find the linear transformation, \(\mathbf{y}_{i}=G^{T} \mathbf{x}_{i}\), where the columns of \(G \in \mathbb{R}^{m \times r}\) are orthonormal, that best preserves the pairwise similarity between data items, \(S=X^{T} X\)?

\(Y=G^{T} X\), and their pairwise similarity is written as \(Y^{T} Y=\left(G^{T} X\right)^{T} G^{T} X=X^{T} G G^{T} X\).

Then, the above problem is written as \({\widehat{G}}=\arg \min _{G}\left\|S-X^{T} G G^{T} X\right\|_{F}\) subject to \(G^{T} G=I_{k}\).

Given \(X=U \Sigma V^{T}=\sum_{i=1}^{n} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{n}\), the optimal solution is given as \({\widehat{G}}=U_{r}=\left[\begin{array}{llll}\mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{r}\end{array}\right]\).

In this case, \(Y={\widehat{G}^{T}} X=U_{r}^{T} U \Sigma V^{T}=\Sigma_{r} V_{r}^{T}\).

We can show that this generates the best solution for the best rank-\(r\) approximation of \(S\).

References

https://www.boostcourse.org/ai251/lecture/540343?isDesc=false