SC - Lezione 19


Checklist

Checklist

Domande, Keyword e Vocabulary

  • Page Rank
  • Importance of a Page in Page Rank
  • Page Rank’s approximated scaled adjacency matrix
  • Power Method
  • Efficiently computing
  • Markov Chain
  • Random Process
  • Transition
  • Sequence of state probability distribution vectors
  • Conditioned probability
  • Conditional probability vs Joint Probability
  • Example: Markov Chain for weather conditions
  • Limit State
  • Power Method
  • Observation on the limit state and demonstration
  • Markov Chain for PageRank
  • Markov Chain for text generation
  • Hidden Markov Model
  • Emission Probabilities Matrix

Appunti SC - Lezione 19

Page Rank

Rate of convergence of the Power Method

In order to estimate the rate of convergence of the Power Method, applied to the matrix i can consider not only the first dominant eigenvalue but also the second dominant eigenvalue. And then compute the ratio between the two in absolute value.

The smaller the ratio, the faster the convergence.

Efficiently computing

We exploit the fact that is a constant matrix

Therefore:

Reducing the problem to linear system

If the problem is of small size, the score vector can be even be computed as solution of the linear system.

Exercises

We have a synthetic web of 20 pages, connections are generated randomly. We interpret this web as a direct graph.

We apply the page rank algorithm

Finally, we execute the script repoted in the Matlab documentation that analyzes the web consisting of the Mathworks site and the first 100 sites linked to it.

Markov Chains

We will show in the next lecture that you can derive the Page Rank Algorithm through the Markov Chains

Markov Chain and random process - Main Idea

A Markov Chain is a random process that undergoes transitions from one state to another within a finite or countable set of states (called state space).

Say we want to model a behavior of a system and this system evolves in time, following a random process. My system can be in one state and this it appartains to a set of certain known state in which the system can be. The system pass from one state to another and this is called transition.

The probability of transitioning to a future state depends solely on the current state, without any influence from the sequence of events that preceded it. This characteristic is known as the Markov property, or memorylessness.

The state can be anything: from weather conditions, to health status of a patient or the web pages. In which web page a user is in a certain step, and so on. Also a text can be generated using Markov Chain.

Components of Markov Chain

  • State space that is the set of all possible states
  • Transitioning probability matrix that is a square matrix, the size is the same of the number of possible states. The element represent the probability of moving from state to state in one step.
  • Initial state distribution vector: a discrete probability distribution over the state space that describes the probability of the system being in any particular state at the start of the process. (It’s used for the initialization or initial step of the Markov Chain). , is the probability that the initial state is

Observation 1:

Definition

A Markov Chain can be defined as a sequence of (state) probability distribution vectors where:

with

sbaglio o assomiglia molto al power method?

Alternative definition

Note that the definition of the transition probability matrix implies also an alternative definition as a sequence of random variables with the Markov Property, where represents the state at step and for all and states that:

In stochastis cases like this, we may use the probability notation.

The equation says that the conditioned probability of moving to state ; at time depends only on the fact that the process is on state at time .

Conditional Probability vs Joint Probability:

Recall that from basic probability theory, the joint probability of being on state at time and in a state at step ; that is

is equal to the product of the probability of being on state at step the conditional probability of passing to the state at the step , if we were on state at the previous step .

Therefore if we have:

PLUS is like an OR. This is the expression is the sum of the probabilities of joint events, since we can get to state at step being on any state at step .

Example: Markov Chain for weather conditions

Let’s consider a system that represents the weather conditions in a certain place. Suppose the system can only stay in one of the following states on a given day: 1-clear, 2-cloudy, 3-rainy. Finally, suppose that the change of state from one day to the next is determined only by the following probabilities, which do not depend on the day, expressed through the transition probability matrix .

  • The columns represents today state and the rows are tomorrow state.
  • The states are 1-clear,2-cloudy,3-rainy.

For example in P(1,1), if today is clear, the probability that tomorrow is clear will be 0.8, that is 80% and so on.

Suppose that i want to know the situation tomorrow, and suppose that i have that is

You multiply the matrix by a vector you get another vector, that will be only the first column

The system can also be represented with a directed graph.

The situation in the day after tomorrow will be again:

Note that So i could also compute raised to the power of multiplied by the initial vector.

Limit State

Will this chain eventually converge or not?

Since that this is similar to the Power Method and we know that the convergence conditions depends on the dominant eigenvector of the matrix

is strictly positive, so all the results seen in the last chapter applied here are true, So we know that it converge.

The convergence is called the vector limit, or limit state, and it is the eigenvector corresponding to the dominant eigenvalue of .

Observation Whatever the initial state , after a suitable large number of step the vector is close to the limit state and the columns tend to become equal to each other, and equal to the limit state.

If we raise to the step , then we have the limit state. It doesn’t evolve anymore if we add more steps.

Demonstration Take any matrix that has all the columns equals to a vector like: If you multiply this matrix by any random probability vector, where , then it holds that:

i can put outside and just sum the component of the random probability vector . This is the only way, starting from any distribution, to converge to the same state.

Realization of a Markov Chain

In the applications of markov chain, it is often necessary to realize a specific path starting from a given initial condition .

We have to pick a characteristic from the distribution.

Supose we have a distribution of the state: and i have to pick a random number from a uniform distribution probability on , then if ; then i’m extracting the first state, if it is comprised between then i will be in the second state, otherwise if ) then we conclude that state 3 has been generated.

We carry out an experiment involving the weather system analyzed so far.
We consruct random paths, each path is one hundred steps of the chain, so . Each path starts from a random initial state. At the end of the path, that is i will record this state.

Then i also record the relative frequencies of the three states that have been obtainted during that path. I.e i count how many times i have been in state 1, in state 2 and in state 3.

If i take the average frequencies along all the 1000 experiments, i see that this is very close to the frequencies that i have record along the states. So all the vectors are an approximation of the limit state.

Markov Chain for PageRank

Imagine that i’m a web surfer, i start from any web page in the internet and i decided to walk around the web and i decide to do this randomly.

Component of the Markov Chain for Page Rank

The web is a discrete state space, moving from a web page to another is a transition. Each web page is a state in the Markov Chain, and we represent with the state space.

The transition probability matrix represents the probability of moving from to page. The probability depends on the number of outgoings links from . For instance, if the page page has outgoing link, then the probability of transitioning to any page linked from is . How to manage dangling nodes?: if a page has not outgoings links, it is treated as if it has outgoing links to all pages in the graph, ensuring that the column-stochasticity property is mantained.

Teleportation: If i’m in a web page, in the 0.85 percent of the case i will pick randomly one of the outgoing pages. in 0.15 i will just jump to another web page of the web even if this is not connected. This damping factor is called , represents the probability that the surfer follows an outgoing link from the page.

So we realize that, using previous assumptions, the probability matrix coincide with the approximated scaled adjacency matrix of the web.

Markov Chain for Text Generation

Main Idea: take a list of words that will represent the set of state. Assign a probability to pass from word to represented in each entry of the usually matrix transition probability matrix . Generate random text of the desired length by passing from state to using the Markov Chain and saving the state each time in a string.

More formally:

  • state space is the set of different words. For a more sophisticated model, states could also represent sequences of words (-grams) where
  • Transition Probability Matrix . The transition probability to a subsequent word is calculated based on the frequency of that word sequence in the corpus. must be normalized (in norm-1) to become column-stochastic.

Say i have a state that is a set of different words. Each unique word in the corpus represents a state in the Markov Chain. For a more sophisticated model, states could also represent sequences of words (n-grams) where

The transition probability matrix , for a given word, the transition probability to a subsequent word is calculated based on the frequency of that word sequence in the corpus.

must be normalized in 1-norm, in order to become column-stochastic.

Hidden Markov Model

THe HMM extend the Markov Chain by adding the concept of hidden states. It’s a sort of generalization of the Markov Chain.

In an Hidden Markov Model a transition lead to a state that is not directly observable and that it will eventually produce a state that is observable.

The components of a HMM are:

  • Hidden State space : similar to a Markov Chain, but the states are not directly observable
  • Observation Space : a set of possible observable outputs
  • Transition Probabilities Matrix : like in a Markov Chain, but for transitions between hidden states.
  • Emission Probabilities matrix : for each hidden state, the probability of each possible observable output being produced, that is the entry is the probability to have the observation , being the hidden state .

has as many rows as the hidden states, and as many columns as the observable, so it’s a sort of connection between the two. What is the probability, that staying in a certain state, you make this observation?

is the probability over hidden states at the time . The prediction of the distribution at step is given by

Each hidden state can generate observations at step with certain probabilities, described by the emission probability matrix . represents the probability of generating observation from hidden state . is the column of corresponding to the observation at time , representing emission probabilities for each hidden state given this observation.

At each step , the probability distribution over hidden states evolves according to the transition probabilities and the new observation’s emission probabilities, and the dynamics of a HMM can be represented as

denotes the componentwise multiplication between two vectors.

What are the inputs of a HMM?:

  • Initial state distribution
  • Transition probability matrix
  • Emission probability matrix
  • Sequence of observations

Pratical Example of Hidden Markov Chain

is the probability of transitioning from raint to rain, is the probability of transitioning from no rain to rain, and so on.

The matrix B is:

the probability of observing Hgh Cloud Coverage, given Rain. the probability of observing HIgh Humidity given rain = probability of observing High Cloud Cover given NoRain

and so on.

The HMM captures how the current belief about the weather (in terms of the probability distribution over hidden states) is updated based on both the transition probabilities between states and the likelihood of observing the current conditions given these states


Sommario