The SVD is the most important decomposition in data science.
Methaphor: the SVD is a sort of extention of eigenvalue and eigenvectors to non square matrix. There is a very strong correlation between eigenvectors and eigenvalues anyway, even if is not entirely correct.
Theorem:
Let A be a matrix that is non square, then there exists an orthogonal matrix (square) called in , an orthogonal matrix in and a diagonal matrix in , that give the Singular Value Decomposition of the matrix A
What is a first difference between SVD and QR factorization?
In SVD we have 3 matrices: 2 orthogonal and 1 diagonal matrix.
In QR we have two matrices: 1 orthogonal and 1 diagonal
This emphasize the fact that A can always be diagonalized by means of an orthogonal transformation of its columns (by means of U) and its rows (by means of V).
What’s the conseguence of this?
If you have any matrix you can rotate columns and rows in a suitable way such that you got a diagonal matrix. Like you can transform any matrix in a diagonal matrix.
Let’s consider the unit sphere of 2 norms: the infinite set of vectors that have two norm equal to one. Ofcourse this set is a circle, since it’s the set of points that have a distance to the origin that is 1 or less than 1.
To these infinite vectors we apply a matrix A which produces the effect on the right. And we obtain an elipsoid (important note: it relates to the singular vector of matrix A).
Take any vector in this circle (for example the yellow and red) of length exactly 1.
Then i multiply the yellow times A and i get the new vector yellow on the right. Same with the red one. If we take the infinite number of matrix vector multiplication we transform this unit sphere into the right one.
We can apply the SVD of A and we obtain three matrices, each will have an effect on the vectors and by applying all three, the effect will be the same as applying .
if we apply we obtain a rotation, applies a scaling and is another rotation. (Remember that V and U are rotation matrix because they are orthogonal and one important property of orthogonal matrix is that they don’t change the length of the vector they multiply)
What happens if you multiply a vector by a diagonal matrix? IT GET SCALED
you can easily verify this in matlab for example
x=[10 20]'F=[2 0; 0 4]x*F
What does it then tell us the singular value decomposition given the previous facts?
It’s a sequence of three simple transformation: a rotation, a scaling and another rotation.
and are oriented in the same direction of the columns of U. How long are these semiaxis depends on the singular values.
The versor of the two sigma are of the first and second column of U (which are also orthornormal between them).
When we introduces the QR factorizaton we have seen that we can have a simpler factorization. The same can be done here, with the “reduced-form” or “economy form” of the SVD.
It follow the strcture of the matrix such that:
What kind of matrix is ? Is it orthogonal? No because it will become a matrix that is not square, so it’s not orthogonal matrix. It will becomes a matrix with orthonormal columns.
V stays orthogonal. The reason for these is in the number of rows and columns (see definition).
The singular values are non negative numbers (can be zero) but are ordered in descending order.
The rank of the matrix A is equal to the number of its non-zero singular value. As conseguence of this, the rank of A is equal to the rank of
The norm A is the same of norm of
The conditional number is the same as
The Frobenius norm of here is computed as the sum of the squares of its singular values, is also equal to the sum of the elements on the diagonal (the trace) of (and ).
Orthogonal Matrices: like for eigenvalues/vectors, the singular values of an orthogonal matrices are all 1.
Rectangular matrix with orthogonal columns
This property also holds for rectangular matrix with orthogonal columns: , where is a diagonal orthogonal matrix with singular values equals to 1
Left and right singular vectors: The singular values of are the same as ; the left singular vectors of are the right singular vectors of ; the right singular vectors of are the left singular vectors of
Define a random matrix , 8-by-5, compute the svd. Compute the trace (sum of element of ), now take the and compute the sum of the square of the diagonal of .
What is the SVD of ?
Most of the data science is based on the fact that, in general, the singular values of real application matrix decay very fast. (Go to zero)
If you take a matrix that comes from real world application usually the decay is even faster.
Relationship between SVD factorization and spectral decomposition §
We want to see the spectral decomposition of that is a matrix
I write the spectral decomposition as:
Using the SVD of A we may write:
Now, if multiply the side from the right by the matrix i obtain: that is:
We observe that the last equation is indeed the spectral decomposition of the matrix where and .
It’s very important application, it’s in practice the principal component analysis (and we will show that is a type of SVD).
Now let’s do the same considering
In an 8 by 5 matrix, how many singular value you have? A diagonal matrix, just 5 elements.
I obtain a different result: that is the same as:
We notice that:
only the first n columns of concide with the first n columns of U
The first n eigenvectors of coincide with the square of the first n diagonal entries of
Another thing to notice is that only the first eigenvectors of coincide with the first columns of and the squares of the singular values coincide with the first eigenvalues of (the remaining (m-n) are zero). This happens when
We define that will be the product of two matrix resulting in a matrix .
The reduced SVD form becomes:
is an orthonormal basis for the column space, while contains the remaining information and can be seen as how the matrix acts in that reduced space.
Note that this approximation is a subspace in which all the information of the matrix really stays. This dimensionality reduction ofcourse depends from the rank of the matrix.
Try to interpret it as matrix vector multiplication: a linear combination of the column and the factor combination is elements of In term of reference system what does it mean?
The columns of the matrix are being transformed.
Orthonormal basis for the orthogonal complement of the column space of A §
The last columns of U are a basis for the orthogonal component of the range of , that is the null space of
Recall
The range of is the set of all possible linear combinations of its columns