### Blog: Markov Chains and HMMs

In this article, we’ll focus on Markov Models, where an when they should be used, and Hidden Markov Models. This article will focus on the theoretical part. In a second article, I’ll present Python implementations of these subjects.

Markov Models, and especially Hidden Markov Models (HMM) are used for :

- Speech recognition
- Writing recognition
- Object or face detection
- Economic Scenario Generation and specific Finance tasks
- And several NLP tasks …

This article was originally published on my personal blog: https://maelfabien.github.io/machinelearning/HMM_1/#

I publish all my articles and the corresponding code on this repository :

### I. Stochastic model

In order to characterize the model, we need :

As you might guess, this is complex to achieve since we need to know a lot of parameters.

### II. Discrete Time Markov Chain Models (DTMC)

#### 1. What is a Markov Chain?

The states here could represent many things, including in NLP. For example, we could have :

And we would be interested in the probabilities have a verb after a noun for example.

#### 2. Transition probabilities

In our example, the transition matrix would be :

Note that if A is stochastic, then A^n is stochastic.

#### 3. States

Otherwise, it is called **aperiodic**. A state with a self-loop is always aperiodic.

#### 4. Sojourn time

Let Ti be the time spent in state i before jumping to other states.

Then, Ti, the **sojourn time**, follows a **geometric distribution** :

The expected average time spent is E(T)=1 / aii

#### 5. M-step transition

The probability of going from i to j in m steps is denoted by :

#### 6. Probability distribution of states

Let πi(n) be the **probability** of being in state i at time n : πi(n)=P(Xn=i)

Then, π(n)=[π1(n),π2(n),…] is the vector of probability distribution which depends on :

Notice that π(n+1) = π(n)A and recursively :

πi is also called **stationary probabilities, steady state or equilibrium distribution.**

#### 7. Generating sequences

To simulate the path of the mouse in the maze, we might want to **generate sequences**.

- We pick a random number to know which state we should start from
- Then, pick a random number to know which state we move to

Suppose we are given the following simple model :

This corresponds to the following matrix A and initial vector of probabilities π:

#### 8. Decoding sequences

#### 9. Use cases

- To identify the language of a sentence by decoding the sequences of characters and identifying the most likely language.
- To predict macroeconomic situations like market crashes and cycles between recession and expansion.
- To predict asset and option prices, and calculating credit risks.
- …

### III. Hidden Markov Models

Hidden Markov Models (HMM) are widely used for :

- speech recognition
- writing recognition
- object or face detection
- part-of-speech tagging and other NLP tasks…

I recommend checking the introduction made by Luis Serrano on HMM.

PoS can, for example, be used for Text to Speech conversion or Word sense disambiguation.

#### A. Emission probabilities

These probabilities are called the **Emission Probabilities**.

#### B. Transition probabilities

We can define what we call the **Hidden Markov Model** for this situation :

- The words you understand are called the
**observations**since you observe them. - The subject they talk about is called the
**hidden state**since you can’t observe it.

#### C. Discrete-Time Hidden Markov Models

An HMM λ is a sequence made of a combination of 2 stochastic processes :

- An
**observed**one: O=o1,o2,…,oT, here the words - A
**hidden**one: q=q1,q2,…qT, here the topic of the conversation. This is called the state of the process.

- The vector of
**initial probabilities**π=[π1,…πq], where πi=P(q1=i) - A
**transition matrix**for**unobserved**sequence A : A=[aij]=P(qt=j ∣ qt−1=j) - A matrix of the
**probabilities**of the**observations**B=[bki]=P(ot=sk ∣ qt=i)

What are the **main hypothesis** behind HMMs?

**Independence**of the observations conditionally to the hidden states : P(o1,…,ot,…,oT ∣ q1,…,qt,…,qT, λ) = ∏i P(ot ∣ qt, λ)- The
**stationary Markov Chain**: P(q1,q2,…,qT) = P(q1)P(q2∣q1)P(q3∣q2)…P(qT∣qT−1) **Joint probability**for a sequence of observations and states : P(o1,o2,…oT,q1,…,qT ∣ λ) = P(o1,…,oT ∣ q1,…,qT, λ) P(q1,…,qT)

A HMM is a subcase of *Bayesian Networks*.

#### D. Find the transition probabilities

You have 15 observations, taken over the last 15 minutes, **W **denotes Work and **H** Holidays.

#### E. Find the emission probabilities

#### F. Probability for a topic at a random time

#### G. If you hear the word “Python”, what is the probability of each topic?

#### H. If you hear a sequence of words, what is the probability of each topic?

- Python was linked to Work, Bear was linked to work
- Python was linked to Holidays, Bear was linked to work
- Python was linked to Holidays, Bear was linked to Holidays
- Python was linked to Work, Bear was linked to Holidays

These scenarios can be summarized this way :

#### I. Decoding with Viterbi Algorithm

The most likely sequence of states simply corresponds to :

We can then move on to the next observation. Here’s what will happen :

By recursion, it can be shown that :

When we only observe partially the sequence and face incomplete data, the EM algorithm is used.

#### J. Generating a sequence

- Generate first the hidden state q1
- From q1, generate o1, e.g Work then Python
- Then generate the transition q1 to q2
- From q2, generate o2
- …

### Conclusion :

In the next article, I’ll try to illustrate these concepts in Python.

### Sources :

- Ecole National Supérieure des Mines: https://www.emse.fr/~xie/SJTU/Ch4DMC.ppt
- MVE220 Financial Risk : http://www.math.chalmers.se/Stat/Grundutb/CTH/mve220/1617/redingprojects16-17/IntroMarkovChainsandApplications.pdf
- HMMs by Udacity : https://www.youtube.com/watch?v=kqSzLo9fenk
- Personal blog MC : https://maelfabien.github.io/machinelearning/HMM_1/#
- Personal blog HMM : https://maelfabien.github.io/machinelearning/HMM_2/#

## Leave a Reply