SC - Lezione 7


Checklist

Checklist

Domande, Keyword e Vocabulary

  • What’s the other way to derive the formula as orthogonal projection b on the range of A?
  • What is an orthogonal basis for the range of A?
  • Range of a matrix
  • Latent Semantic Analysis
  • Documents and keyword example
  • Query as vector
  • Semantic space to Geometric Space
  • Perfect Reconstruction
  • Application: Streaming Data
  • UpdateQR algorithm
  • Farm Example
  • Overdetermined System

Appunti SC - Lezione 7

The formula we derived in the previous lecture QR and Least Squares That is: can be obtained in a different way.

Question: How can we obtain this result by noting that 𝐴𝑥 is the orthogonal projection of 𝑏 onto the range of 𝐴?

First we need to answer to the question “What is an orthogonal basis for the range of A”? It’s the column of that forms a basis for the range of A

Let be the representation of the matrix that apply the orthogonal projection. We have that:

We know that can be expressed using the orthonormal basis , and the projector is given by . Also express as QR factorization and then i get:

on the both side is canceled, but i have to multiply by . Since appears on both sides, we can cancel (multiplying both sided by ), leaving:

But since they have both orthonormal columns that is the identity matrix. But this matrix doesn’t change nothing so we can also eliminate it and obtain:

this is a common exam question

QR Application: Latent Semantic Analysis

The latent semantic analysis is a non standard application of the QR factorization.

Any word in natural language can be translated into a vector. Imagine that the document is like a book:

documents = {'Baked bread without recipe';
    'The art of classical Viennese pastry';
    'Numerical recipes: the art of scientific calculation';
    'Bread, pastry, desserts and cakes: precise recipes for baking';
    'Pastry: the book of the best French recipes'};
d = length(documents)

Let’s also consider the following keywords

terms = {'baked','recipe', 'bread', 'cake','pastry','dessert'};
t = length(terms)

The idea is to represent the set of document by means of a Matrix. And this matrix is called “term-documents” matrix

B= [1 0 0 1 0;         % documents in which the first term appears
    1 0 1 1 1;         % documents in which the second term appears
    1 0 0 1 0;
    0 0 0 1 0;
    0 1 0 1 1;
    0 0 0 1 0]         % documents in which the last term appears

for example see on the first columns that ‘baked’ (the first word), ‘recipe’, (the second word) and ‘bread’ appears in the first element of the document.

We can also read by row: the first word (baked) appears in the firsta and the fourth document.

For instance, if we are in a completely different situation where there are too many keywords and documents, you can put the frequency (like absolute frequency or normalized frequency by the number of words of the documents).

Normalization of a matrix

A matrix B can be normalized for example by dividing each column by its 2 norm. In matlab we do normalize(M,k).

Query

A query is a vector in the same space of the documents. Like six components in which the query only shows baked and bread.

For example a query based on the terms “baked” and “breads” return an array that, each element represents one of the document. The value is (1) if it contains both baked and breads, otherwise the it’s (0). The array has a number of element as many as documents.

The problem is: what are the documents which are closest to this query? It’s not anymore a problem in semantic space but in geometric space, where distances are defined.

What we can do is: selecting documents that have a similarity with the query greater than a certain threshold value. That represent how similar the two vectors are.

What we are interest is not the distance but another value: the similarity. In particular the cosine similarity:

similarity = @(v,w) v'*w/(norm(v)*norm(w));

In math:

We calculate the similarity between the query and all documents and display the relevant documents.

Matrix Rank

Let be the matrix but normalized. We notice that the matrix has non-maximum rank. (rank = 4 < 5). We can use a submatrix instead.

To do this, we apply the QR factorization, by considering the rank-reduced QR Factorization (only the first rows of ). The matrix of has the last two rows that are 0. So as conseguence, the last two columns of don’t contribute anything to . (Similar to Reduced QR Factorization, but for )

The similarity between the -th document and the query can be computed either in the original standard basis or in the new reduced-dimensional subspace spanned by the columns of .

We notice (in matlab) that these are perfect reconstruction, this is not an approximation. The similarity is the same number for instance. The similarity function produces exactly the same output for the following three snippets of code:

similarity(q,B(:,j)) %Similarity 
similarity(q,Qr*Rr(:,j)) % Similarity with full QR
similarity(Rr(:,j),Qr'*q) % Similarity with QR in a reduced-dimensional subspace spanned by the column of Qr

Notice how in the third case we have to multiply by (for computing the new reduced-dimensional subspace).

is the vector that represent the query and that we are interested in finding similar vectors.

Approximation by deleting the last column and row

Let’s consider now the matrices and , which are obtained by deleting the last column of the and the last row of We obtain a matrix that does not produces back by the product, but retains most information and provides a good enough approximation.

By computing the similarity we obtain a different value. Another thing to notice is that the last two similarities are the same but they slightly differ from the similarities computed previously with the full and matrix.

Another thing to notice is that the best document (the one with the most similarity, j=1) is computed correctly even in the reduced dimensional space of the approximation and .

Soon we will see a reduction in the row space though the singular value decomposition.

SVD is more accurate but QR is less expensive, there are application in which is more efficient to use QR instead of SVD.

Note

In the row 7 we have the fourth document “Baking”; in the keyword we have “baked”. For this example we consider them as they are the same words.

QR Application: Analysis of streaming data - QR Updating

Imagine that you have a set of data and then a new set of data arrives and so on. There is a flow of data in real time. Data arrives sequentially over time.

QR Update

One of the key features of QR decomposition is its ability to be updated incrementally.

As new data arrives: (new row or columns are added to a matrices), the QR Decomposition can be updated without the need to recompute the entire decomposition from scratch.

[!note ] Recall In this course rows are samples and columns are feature. In this case we will not add new characterstics but we will add new samples.

So we want to update the QR decomposition to include B, without explicitely computing the QR factorization of A’

The steps are the following:

  1. Concatenate B to A and get the updated matrix:
  2. Compute the QR decomposition of B, that is
  3. Update Q and R, considering the augmented matrix and computing the QR decomposition of
  4. finally to obtain the update and for we compute:

While,

Size of matrices:

  • A:
  • Q =
  • R =
  • The new block B is of size
  • The resulting updated matrix is
  • :
  • is in
  • is in

Why we do the step (3) instead of computing the QR factorization from scratch of A’?

There is a time complexity game if i just give a look at dimensions. The matrix is a sparse matrix, so its complexity is almost like the half of the complexity of doing the QR of . If i do the QR factorization taking in account that they are a sparse matrix i have a gain in term of complexity.

Farm Example

The problem is an overdetermined system (more equation than unknowns). New data arrives so the structure changes over time and it is rectangular not square. This can be solved using the Least Square solution. We can do that but we have to repeat it several time. You have to recompute any time you get new data.