SC - Lezione 13
Checklist
Domande, Keyword e Vocabulary
- Randomized SVD
- Dimensionality reduction through randomness
- Approximation
- Johnson-Lindenstrauss Lemma
- SVD for recomendation systems
Appunti SC - Lezione 13
Randomized SVD
Randomized SVD Introduction
What is is used for: computing an approximation of the largest singular value and the corresponding singular vector of a large matrix.
SVD is useful when we have matrices so large that they cannot fit into memory.
The idea is to reduce the dimensionality of the problem through randomness before applying the traditional SVD.
Randomized SVD provides only an approximation not an exact reconstruction of singular values and singular vectors.
Main Steps
The main steps are:
- Create a random matrix through a gaussian distribution of size
where k is the target to which we want to reduce the dimension of our matrix (and we call it , that is drawn from a Gaussian distribution). - Compute the matrix
that is: - Orthonormalize
using for example the k-reduced QR: to obtain a basis - Projection: compute the matrix
that is: which is smaller than . B is the projection of (the columns of) onto the subspace spanned by - Compute the SVD on B:
- Approximate the left singular vectors and singular values of A: compute
to approximate the first left singular vectors of U. Then pick the largest singular values of as approximation of the largest singular value of A.
Observation 1: The matrix
Observation 2: on the Projection phase, we compute a matrix B that is much smaller than A. This is useful on the next step because computing
Observation 3: In the last phase, The first
Application: this method can be applied in image compression, by picking a number of
From an application point of view, one thing we are interested is understanding if the approximation of the image that we computed with the randomized SVD is good enough.
To do this we saw two struments: the compression ratio and the principal angle.
Compression Ratio
It’s a quantity that compares the size of the compressed image with the original one. An higher compression ratio means more compression i.e the image take less storage space compared to the original.
Principal Angles
We want to compute an angle between the two subspaces spanned by the columns of
Recall that:
and and and represents the left singular vectors of the respective matrices.
If the
There are
- The first principal angle is the minimum angle between any column of G and H
- The second principal angle is the second minimum angle between any column of G and H
and so on.
Johnson-Lindenstrauss Lemma
Justifies theorietically the Randomized SVD
Thesis / Idea: The thesis is that angles are preserved when you project points from an high-dimensional space into a lower-dimensional ones. It states that any set of points in a high-dimensional space can be mapped into a lower-dimensional space such way that the distances between the points are approximately preserved.
Definition
More formaly:
Let
is the dimensional space we wish to project into,
The elements of the matrix
The linear map that projects a point (vector)
where the factor on the left of Au is used to normalize the projection, ensuring that the distances between points are approximately preserved.
Given two vectors
Specifically the Johnson-LInderstrauss Lemma guarantes that:
- Observation: the distance is expressed with the difference of the norm raised to 2
n=10000;
d=5000;
k=100;
epsilon=0.9;
8*log(n)/epsion^2
A=randn(k,d)/sqrt(k);
oktest=zeros(n,1);
for i=1:n/2
u=rand(d,1); v=rand(d,1);
hd_distance=norm(u-v);
u_projected = A*u; v_projected = A*v;
ld_distance = norm(u_projected - v_projected);
oktest(i)=((1-epsilon)*hd_distance^2 <= ld_distance^2) && (ld_distance^2 <= (1-epsilon)*ld_distance^2;
end
fprintf('the test is repeated %d times \n number of times the distance of the projected pair satisfies the JL Lemma: %d',numel(oktest),numel(oktest == 1));SVD for recommendation systems
What are they used for: Recommendation systems are used to predict the rating or preference a user would assign to a particular item, such as music, movies, and other types of content.
Formal definition of the problem
Let b:
Let’s introduce an example about a RS used by Netflix in order to recommend to the user the item may like.
In our example we have 7 peoples and 5 movies
Ofcourse the rating function is not explicitely known, but rather it is a lookup function. I can express my lookup function as a matrix.
The rank of the matrix
SVD
We apply the usual SVD rank reduced:
Note also that in this specific case, we may identify the 2 basis vector (columns of
Since
Interpretation of the movies genres by looking at data
Starting from the columns of
- Sci-fi (involves Matrix, Inception, Star Wars)
- Romance the remaining ones
If i want to look at the users i need to consider the row space.
If we consider the row of the matrix
Each row tells how much each users like each genre. For instance the first user, he preferes the Sci-Fi. The last three preferes the other genre.
Recomending a new movie to a new user Peter Suppose now that i insert a new user Peter, who only watched the movie Matrix and rates it 5. How could we recommend movies to him?
We represent by means of a vector:
And as we expect, peter prefers sci-fi and doesn’t like romance. Now we represent Peter’s ratings into genre subspace.
Then we map this representation back into movie space by computing
This will be the orthogonal projection of the rating
In matlab:
q = [5,0,0,0,0]'; % Peter Ratings
peter = Vr'*q % Peter in the the genre subspace (columns of Vr)
interests = Vr*peter % this is the orthogonal projection of the rating q into the movie spaceThis is like doing:
This gives me a vector of interests, that emphasizes that Peter maybe would like Matrix, Inception and Star Wars but not Casablanca or Titanic. So a certain movie company can recommend other Sci-fi movies to Peter.
Finding movie buddies to Peter
Another interesting approach is to find users who are similar to peter. For instance, we map all users into the genre subspace (whose basis are the rows of
M = [1 1 1 0 0; 3 3 3 0 0; 4 4 4 0 0; 5 5 5 0 0; 0 0 0 4 4; 0 0 0 5 5; 0 0 0 2 2;] % user-item matrix
[U, S, V] = svd(M);
Vr = V(:, 1:r)
Ratings = M*VrWe extract the genre rating of jack:
jack = Ratings(5, :); % extract Jack genre ratingThen we use the cosine similarity function to evaluate the similarity between Jack and Peter ratings (and we get zero).
similarity = @(v,w) v'*w/(norm(v)*norm(w));
similarity(peter,jack')
%ans =00 means that there isn’t much similarity because Jack prefers romance since he rated Casablanca and Titanic as 4/5 and 4/5.