SC - Lezione 13


Checklist

Checklist

Domande, Keyword e Vocabulary

  • Randomized SVD
  • Dimensionality reduction through randomness
  • Approximation
  • Johnson-Lindenstrauss Lemma
  • SVD for recomendation systems

Appunti SC - Lezione 13

Randomized SVD

Randomized SVD Introduction

What is is used for: computing an approximation of the largest singular value and the corresponding singular vector of a large matrix.

SVD is useful when we have matrices so large that they cannot fit into memory.

The idea is to reduce the dimensionality of the problem through randomness before applying the traditional SVD.

Randomized SVD provides only an approximation not an exact reconstruction of singular values and singular vectors.

Main Steps

The main steps are:

  • Create a random matrix through a gaussian distribution of size where k is the target to which we want to reduce the dimension of our matrix (and we call it , that is drawn from a Gaussian distribution).
  • Compute the matrix that is:
  • Orthonormalize using for example the k-reduced QR: to obtain a basis
  • Projection: compute the matrix that is: which is smaller than . B is the projection of (the columns of) onto the subspace spanned by
  • Compute the SVD on B:
  • Approximate the left singular vectors and singular values of A: compute to approximate the first left singular vectors of U. Then pick the largest singular values of as approximation of the largest singular value of A.

Observation 1: The matrix has columns that are a linear combination of the columns of . They define a subspace of dimension (because the matrix is ) of the range of .

Observation 2: on the Projection phase, we compute a matrix B that is much smaller than A. This is useful on the next step because computing is much more efficient respect to compute . We have that is a matrix , is a matrix and is .

Observation 3: In the last phase, The first left singular vectors of () can be approximated by doing . The largest singular value of are already a good approximation of the largest singular value of . Finally, is not close to since it directly provides an orthonormal basis for the row space of that has been preserved since in the projection phase, only the columns of are transformed by doing .

Application: this method can be applied in image compression, by picking a number of less than the total number of singular values/vectors of the image A. Then we can reconstruct the image from its compressed SVD representation.

From an application point of view, one thing we are interested is understanding if the approximation of the image that we computed with the randomized SVD is good enough.

To do this we saw two struments: the compression ratio and the principal angle.

Compression Ratio

It’s a quantity that compares the size of the compressed image with the original one. An higher compression ratio means more compression i.e the image take less storage space compared to the original.

Principal Angles We want to compute an angle between the two subspaces spanned by the columns of and . The angle provides us the information about how close they are together.

Recall that:

  • and and and represents the left singular vectors of the respective matrices.

If the columns of both and span the same or very similar subspaces, the cosine of the principal angles will be close to one.

There are principal angles. We can consider all the vectors and select the ones with the smallest angle. In general, between two matrices and with orthonormal columns, the principal angles are defined recursively:

  • The first principal angle is the minimum angle between any column of G and H
  • The second principal angle is the second minimum angle between any column of G and H

and so on.

Johnson-Lindenstrauss Lemma

Justifies theorietically the Randomized SVD

Thesis / Idea: The thesis is that angles are preserved when you project points from an high-dimensional space into a lower-dimensional ones. It states that any set of points in a high-dimensional space can be mapped into a lower-dimensional space such way that the distances between the points are approximately preserved.

Definition More formaly: Let be a set of points in a high-dimensional space and let be a small positive number such that . Let A be a random matrix in where d is the original dimensionality of data,

is the dimensional space we wish to project into, is the number of points, and is the allowed distortion.

The elements of the matrix are typically drawn independently from a normal distribution or a smaller distribution.

The linear map that projects a point (vector) from into a lower-dimensional space can be written as:

where the factor on the left of Au is used to normalize the projection, ensuring that the distances between points are approximately preserved.

Given two vectors and in the distance between their projection and is approximately equal to the distance between the original points.

Specifically the Johnson-LInderstrauss Lemma guarantes that:

  • Observation: the distance is expressed with the difference of the norm raised to 2
n=10000;
d=5000;
k=100;
epsilon=0.9;
8*log(n)/epsion^2

A=randn(k,d)/sqrt(k);
oktest=zeros(n,1);
for i=1:n/2
	u=rand(d,1); v=rand(d,1);
	hd_distance=norm(u-v);
	u_projected = A*u; v_projected = A*v;
	ld_distance = norm(u_projected - v_projected);
	oktest(i)=((1-epsilon)*hd_distance^2 <= ld_distance^2) && (ld_distance^2 <= (1-epsilon)*ld_distance^2;
end
fprintf('the test is repeated %d times \n number of times the distance of the projected pair satisfies the JL Lemma: %d',numel(oktest),numel(oktest == 1));

SVD for recommendation systems

What are they used for: Recommendation systems are used to predict the rating or preference a user would assign to a particular item, such as music, movies, and other types of content.

Formal definition of the problem Let b: be the set of users. Let be the set of items, such that: : the function produces a recomandation for each users and for each item.

is the user-item matrix. We want to select, for each user the item that maximizes the solution for i.e. the function such that:

Let’s introduce an example about a RS used by Netflix in order to recommend to the user the item may like.

In our example we have 7 peoples and 5 movies

Ofcourse the rating function is not explicitely known, but rather it is a lookup function. I can express my lookup function as a matrix.

The rank of the matrix is 2, so we just need a subspace of dimension two to represent the movies.

SVD We apply the usual SVD rank reduced:

Note also that in this specific case, we may identify the 2 basis vector (columns of ) that suggests that there are two genres in the movies. A user is a vector of its rating so 5 component, While a movie is a vector of 7 components which are the ratings gived by the users.

Since is the rank, the SVD factorization is exact (it’s not an approximation, as we know).

Interpretation of the movies genres by looking at data Starting from the columns of we can identify two movies genres by looking at the columns of which are the components of the movies on the basis :

  1. Sci-fi (involves Matrix, Inception, Star Wars)
  2. Romance the remaining ones

If i want to look at the users i need to consider the row space. If we consider the row of the matrix which are the components of the users on the basis , link the users to movies genres.

Each row tells how much each users like each genre. For instance the first user, he preferes the Sci-Fi. The last three preferes the other genre.

Recomending a new movie to a new user Peter Suppose now that i insert a new user Peter, who only watched the movie Matrix and rates it 5. How could we recommend movies to him?

We represent by means of a vector: If we compute we project Peter’s rating into the r-dimensional subspace of the row space of M, that we are calling the genre subspace.

And as we expect, peter prefers sci-fi and doesn’t like romance. Now we represent Peter’s ratings into genre subspace.

Then we map this representation back into movie space by computing

This will be the orthogonal projection of the rating into the basis .

In matlab:

q = [5,0,0,0,0]'; % Peter Ratings
peter = Vr'*q    % Peter in the the genre subspace (columns of Vr)
interests = Vr*peter  % this is the orthogonal projection of the rating q into the movie space

This is like doing: that we know is the ortogonal projector of the rating into the movie subspace (represented by rows, by right singular vectors).

This gives me a vector of interests, that emphasizes that Peter maybe would like Matrix, Inception and Star Wars but not Casablanca or Titanic. So a certain movie company can recommend other Sci-fi movies to Peter.

Finding movie buddies to Peter Another interesting approach is to find users who are similar to peter. For instance, we map all users into the genre subspace (whose basis are the rows of i.e. the columns of ) That is we calculate the matrix that gives the components of the rows of in the basis .

M = [1 1 1 0 0; 3 3 3 0 0; 4 4 4 0 0; 5 5 5 0 0; 0 0 0 4 4; 0 0 0 5 5; 0 0 0 2 2;] % user-item matrix
[U, S, V] = svd(M);
Vr = V(:, 1:r)  
Ratings = M*Vr

We extract the genre rating of jack:

jack = Ratings(5, :); % extract Jack genre rating

Then we use the cosine similarity function to evaluate the similarity between Jack and Peter ratings (and we get zero).

similarity = @(v,w) v'*w/(norm(v)*norm(w));
similarity(peter,jack')

%ans =0

0 means that there isn’t much similarity because Jack prefers romance since he rated Casablanca and Titanic as 4/5 and 4/5.