Assignment 2
Sanket Shah, ss4228
A Markov Model describes a graph structure of possibilities, with each node representing a state. Each node-to-node transition has some probability. At each time step there is some new current state (node). If the model ran indefinitely, there would be a natural convergence of total probabilities for each state. (what is the probability of any given state at this time?).
Suppose that the system is totally visible, and we can observe which state it is in. If we were to decouple the observation from the state, we can say that at time
This is a Hidden Markov Model. The states are hidden. This is a parametric model
- Predict the state of the Markov Process
- Figure out which Markov process we’re looking at (classification)
We’re trying to calculate
In other words, we’d like to know:
But we cannot compute it directly. Therefore we must use this relationship:
However, the number possible state sequences is extremely large, at
The algorithm works generally as follows:
- Initialize a random
$\lambda$ - Compute new
$A$ ,$B$ , and$pi$ - Compute new forward/backward, and related probabilities.
- Repeat 2/3 until stopping criteria.
In my implementation, the algorithm could only handle single observation sequences. I did not implement ability to account for multiple observation sequences. The main challenge here is dealing with “underflow” errors, in which the joint probabilities become so small that they can no longer be represented in 64bits. There are 2 ways to deal with this:
- Compute everything in “Log-Space”. This is acceptable since we only care about the relative probabilities. This is the method I used. Although the results of calculations were always converted back to standard probability space when storing. They were “normalized” where required (not in the forward/backward) within log-space using Scipy’s logsumexp function
- Normalize on the forward/backward calculations. Rabiner argues that this is possible, although I was not able to get it to produce reasonable results.
Overall, results were poor. Despite what seems to be a correct implementation, predicted classes (gestures) performed about as good as random chance, using the recommended hyperparameters (# States, # Observations). Below are the results for testing data:
The predictions of the test data are as follows, in order of decreasing probability:
{
‘observation’: [predicted classes]
'./test/test1.txt': ['circle14', 'inf18', 'eight08'], './test/test2.txt': ['circle14', 'eight08', 'beat4_08'], './test/test3.txt': ['beat3_02', 'wave07', 'inf18'], './test/test4.txt': ['beat4_08', 'beat3_02', 'inf18'], './test/test5.txt': ['wave07', 'beat3_02', 'inf18'], './test/test6.txt': ['beat4_08', 'inf18', 'wave07'], './test/test7.txt': ['wave07', 'eight08', 'circle14'], './test/test8.txt': ['beat4_08', 'eight08', 'wave07']
}
And here are the results for the training data:
{
‘observations’: ‘class prediction’, prob
'./train/wave07.txt': wave07 1.521971e-299 './train/circle14.txt': beat3_02 1.555590e-299 './train/eight08.txt': beat4_08 2.080732e-299 './train/inf18.txt': circle14 1.519275e-299 './train/beat4_08.txt': wave07 1.532866e-299 './train/beat3_02.txt': beat4_08 1.524305e-299
}
Log-Probability Training plots below. They represent the marginalized probabilities of the observations. As you can see, they fail to consistently improve.