3. SVD 3
1. 키워드
- Principal Component Analysis(주성분분석)
- Gram Matrix(그람행렬)
- Low-Rank Approximation
- Dimension-Reducing Transformation
2. Eigendecomposition in Machine Learning
In machine learning, we often handle symmetric positive (semi-)definite matrix.
Given a (feature-by-data item) matrix \(A \in \mathbb{R}^{m \times n}\), \(A^{T} A\) represents a (data item-by-data item) similarity matrix between all pairs of data items, where the similarity is computed as an inner product.
Likewise, \(A A^{T}\) represents a (feature-by-feature) similarity matrix between all pairs of features, indicating a kind of correlations between features.
- Covariance matrix in principal component analysis
- Gram matrix in style transfer
3. Low-Rank Approximation of a Matrix
Recall a rectangular matrix \(A \in \mathbb{R}^{m \times n}\), its SVD can be represented as the sum of outer products \(A=U \Sigma V^{T}=\sum_{i=1}^{n} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{n}\).
Consider the problem of the best low-rank approximation of \(A\):
The optimal solution is given as \({\hat{A}_{r}}=\sum_{i=1}^{r} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{{r}}\).
We approximate \(A\) as \(A_{r}\) by setting \(\sigma_{i}=0\) for \(\forall i \geq(r+1)\):
4. Dimension-Reducing Transformation
Given a (feature-by-data item) matrix \(X \in \mathbb{R}^{m \times n}\), consider the linear transformation, \(G^{T}: \mathbf{x} \in \mathbb{R}^{m} \mapsto \mathbf{y} \in \mathbb{R}^{r}\), where \(G \in \mathbb{R}^{m \times r}\) and \(r<m\).
Can we find the linear transformation, \(\mathbf{y}_{i}=G^{T} \mathbf{x}_{i}\), where the columns of \(G \in \mathbb{R}^{m \times r}\) are orthonormal, that best preserves the pairwise similarity between data items, \(S=X^{T} X\)?
\(Y=G^{T} X\), and their pairwise similarity is written as \(Y^{T} Y=\left(G^{T} X\right)^{T} G^{T} X=X^{T} G G^{T} X\).
Then, the above problem is written as \({\widehat{G}}=\arg \min _{G}\left\|S-X^{T} G G^{T} X\right\|_{F}\) subject to \(G^{T} G=I_{k}\).
Given \(X=U \Sigma V^{T}=\sum_{i=1}^{n} \sigma_{i} \mathbf{u}_{i} \mathbf{v}_{i}^{T}\), where \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{n}\), the optimal solution is given as \({\widehat{G}}=U_{r}=\left[\begin{array}{llll}\mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{r}\end{array}\right]\).
In this case, \(Y={\widehat{G}^{T}} X=U_{r}^{T} U \Sigma V^{T}=\Sigma_{r} V_{r}^{T}\).
We can show that this generates the best solution for the best rank-\(r\) approximation of \(S\).