SC - Lezione 7
Checklist
Domande, Keyword e Vocabulary
- What’s the other way to derive the formula as orthogonal projection b on the range of A?
- What is an orthogonal basis for the range of A?
- Range of a matrix
- Latent Semantic Analysis
- Documents and keyword example
- Query as vector
- Semantic space to Geometric Space
- Perfect Reconstruction
- Application: Streaming Data
- UpdateQR algorithm
- Farm Example
- Overdetermined System
Appunti SC - Lezione 7
The formula we derived in the previous lecture QR and Least Squares
That is:
Question: How can we obtain this result by noting that 𝐴𝑥 is the orthogonal projection of 𝑏 onto the range of 𝐴?
First we need to answer to the question “What is an orthogonal basis for the range of A”? It’s the column of
Let
is the matrix is the orthogonal projector is a least square solution
We know that
But since they have both orthonormal columns
this is a common exam question
QR Application: Latent Semantic Analysis
The latent semantic analysis is a non standard application of the QR factorization.
Any word in natural language can be translated into a vector. Imagine that the document is like a book:
documents = {'Baked bread without recipe';
'The art of classical Viennese pastry';
'Numerical recipes: the art of scientific calculation';
'Bread, pastry, desserts and cakes: precise recipes for baking';
'Pastry: the book of the best French recipes'};
d = length(documents)Let’s also consider the following keywords
terms = {'baked','recipe', 'bread', 'cake','pastry','dessert'};
t = length(terms)The idea is to represent the set of document by means of a Matrix. And this matrix is called “term-documents” matrix
B= [1 0 0 1 0; % documents in which the first term appears
1 0 1 1 1; % documents in which the second term appears
1 0 0 1 0;
0 0 0 1 0;
0 1 0 1 1;
0 0 0 1 0] % documents in which the last term appearsfor example see on the first columns that ‘baked’ (the first word), ‘recipe’, (the second word) and ‘bread’ appears in the first element of the document.
We can also read by row: the first word (baked) appears in the firsta and the fourth document.
For instance, if we are in a completely different situation where there are too many keywords and documents, you can put the frequency (like absolute frequency or normalized frequency by the number of words of the documents).
Normalization of a matrix
A matrix B can be normalized for example by dividing each column by its 2 norm. In matlab we do normalize(M,k).
Query
A query is a vector in the same space of the documents. Like six components in which the query only shows baked and bread.
For example a query based on the terms “baked” and “breads” return an array that, each element represents one of the document. The value is (1) if it contains both baked and breads, otherwise the it’s (0). The array has a number of element as many as documents.
The problem is: what are the documents which are closest to this query? It’s not anymore a problem in semantic space but in geometric space, where distances are defined.
What we can do is: selecting documents that have a similarity with the query greater than a certain threshold value. That represent how similar the two vectors are.
What we are interest is not the distance but another value: the similarity. In particular the cosine similarity:
similarity = @(v,w) v'*w/(norm(v)*norm(w));
In math:
We calculate the similarity between the query and all documents and display the relevant documents.
Matrix Rank
Let
To do this, we apply the QR factorization, by considering the rank-reduced QR Factorization (only the first
The similarity between the
We notice (in matlab) that these are perfect reconstruction, this is not an approximation. The similarity is the same number for instance. The similarity function produces exactly the same output for the following three snippets of code:
similarity(q,B(:,j)) %Similarity
similarity(q,Qr*Rr(:,j)) % Similarity with full QR
similarity(Rr(:,j),Qr'*q) % Similarity with QR in a reduced-dimensional subspace spanned by the column of QrNotice how in the third case we have to multiply
Approximation by deleting the last column and row
Let’s consider now the matrices
By computing the similarity we obtain a different value. Another thing to notice is that the last two similarities are the same but they slightly differ from the similarities computed previously with the full
Another thing to notice is that the best document (the one with the most similarity, j=1) is computed correctly even in the reduced dimensional space of the approximation
Soon we will see a reduction in the row space though the singular value decomposition.
SVD is more accurate but QR is less expensive, there are application in which is more efficient to use QR instead of SVD.
Note
In the row 7 we have the fourth document “Baking”; in the keyword we have “baked”. For this example we consider them as they are the same words.
QR Application: Analysis of streaming data - QR Updating
Imagine that you have a set of data and then a new set of data arrives and so on. There is a flow of data in real time. Data arrives sequentially over time.
QR Update
One of the key features of QR decomposition is its ability to be updated incrementally.
As new data arrives: (new row or columns are added to a matrices), the QR Decomposition can be updated without the need to recompute the entire decomposition from scratch.
[!note ] Recall In this course rows are samples and columns are feature. In this case we will not add new characterstics but we will add new samples.
So we want to update the QR decomposition to include B, without explicitely computing the QR factorization of A’
The steps are the following:
- Concatenate B to A and get the updated matrix:
- Compute the QR decomposition of B, that is
- Update Q and R, considering the augmented matrix
and computing the QR decomposition of - finally to obtain the update
and for we compute:
While,
Size of matrices:
- A:
- Q =
- R =
- The new block B is of size
- The resulting updated matrix
is : is in is in
Why we do the step (3) instead of computing the QR factorization from scratch of A’?
There is a time complexity game if i just give a look at dimensions.
The matrix
Farm Example
The problem is an overdetermined system (more equation than unknowns). New data arrives so the structure changes over time and it is rectangular not square. This can be solved using the Least Square solution. We can do that but we have to repeat it several time. You have to recompute any time you get new data.