Time to read: 5 min read
According to Mullachery, Khera, and Husain, there are three steps to consider for probablistic modeling: model specification, model inference, and model checking. Metropolis-Hastings (MH) is helpful in the model inference phase, where we want to compute the posterior value of parameters.
Recall Baye's theorem, typically when we want to sample from a distribution dependently (when the next sample is dependent on the current sample), we use the posterior function. MH allows us to use Baye's without the normalizing denominator (without
MH is a Markov chain Monte Carlo (MCMC) method, since the decision on the next state/ sample depends solely on the current state/ sample (Markov chain) and it uses pseudo-random numbers in constructing its walk (Monte Carlo).
TIP
Monte Carlo is a famous casino in the principality of Monaco (hence the randomness metaphor), and Monégasques (citizens of Monaco) are not allowed to work or gamble in the casino.
The whole idea of MCMC algorithms are to draw samples from some probability distribution without having to know the exact value ("height") of the distribution. The algorithm essentially wanders around and it probabilistically visits parts of the distribution with higher probabilities more often. In other words, given an unnormalized distribution
Image of Mountain From Pexel
Say you have a mountain but you don't know the shape of it. You have a friend who enjoys mountaineering, particularly when the altitude is high. Your friend also has the superpower to teleport; your friend wants to go to where the mountain is high, but since they only recently discovered their teleporting abilities, they cannot fully control it yet. As a result, your friend teleports everywhere over the mountain, but more so for higher altitudes, since they are aiming to reach the peaks of the mountain. If you strap a GPS to your friend, and let them teleport for enough time, you should in theory be able to map the shape of the mountain. This is because while your friend cannot fully control where they teleport on the mountain, they prefer places with higher altitude. They will thus try to teleport to said places with higher altitude (mountain peaks); they may not get there each time, but the frequency of them teleporting to the peaks is higher than places with lower altitude.
MCMC is basically your teleporting friend; it will try to get to the peak(s) of the distributions (places where the probability is high). While it may not get there each time, it will be there more frequently than other places.
Say we want to determine the shape of
The actual algorithm is as follows:
First we initialize
Then for each iteration
We first draw our next potential state,
In the Random Walk version of the algorithm, we draw the next value in our chain from a random distribution centered on the previous value. For example, we can use the normal distribution:
We can also enforce the property that
We now calculate our acceptability ratio
Once we have
If
If
How often we accept candidates is dependent on what type of algorithm we're using. For instance, if we're using
We iterate again with the new
I've always liked probablistic algorithms, especially how if they're given enough time, then can eventually uncover patterns. Ben Lambert released an excellent video to illustrate how Random Walk Metropolis maps the distribution over time visually.