Markov Chains, Beyond the Mean

Suppose we are interested in the number of moves it takes to move from an initial state to another given state in a Markov chain. Finding the expectation of the number of moves is simple, but how about its variance?

October 2, 2018

Here’s an example of a typical Markov chain problem: say there’s a four-sided die. You keep rolling this die until your current roll comes up with a 2 and your previous roll was a 1. Let equal your number of rolls. What is ?

This problem can be modeled as a Markov chain with 3 states:

In this Markov chain, corresponds to being finished (your past two rolls were a 1 then a 2, in that order); corresponds to being partly done (you just rolled a 1 and are now hoping for a 2), and corresponds to no progress (still hoping to roll a 1). There’s a 1/4 chance of transitioning from to because there’s a 1/4 chance of rolling a 1; there’s a 1/2 chance of transitioning from back to because rolling a 3 or 4 will get you back to nothing; and so on.

Calculating is relatively straightforward. Let be the expected number of rolls needed to reach given we’re at . We already know , and we can calculate the other through some algebra. For example, if we’re currently at :

there’s a 3/4 probability we roll a 2, 3, or 4, in which case we stay in . In this case, our expected number of rolls to get to is : 1 for the roll we just did and another to get from to .
there’s a 1/4 probability we roll a 1. In this case, we’d expect to take rolls to get to ; 1 for the roll we just did and more once we’re at .

By the definition of expectation, is average of the number of rolls expected in each case, weighted by the probability of that case. We can write a similar equation for when we’re in , and get this system:

which we solve and get .

A trickier problem that can actually be solved with the same approach is calculating . Since we already know and the challenge becomes calculating . The non-linearity of makes this difficult. Whenever we roll the die, the number of rolls we’ve made increases by one (duh). The square of the number of rolls we’ve made, however, increases by , where is the number of rolls we’ve made before that one. In some sense, is “stateful” because a given roll’s impact on depends on past rolls.

Through the magic of linearity of expectation, though, things turn out fine. Analagous to what we did to find , let equal the expectation of the square of the number of rolls to get from to . We know and we want to find , which we can do with some algebra. Assuming we’re at :

there’s a 3/4 chance we roll a 2, 3, or 4 and stay in . Say we take rolls after this one to reach . Then the square of the number of rolls we took is . On expectation, this is . By definition, and we computed earlier that so our expression simplifies to . Note what this does not equal:
- , because in general . This is pretty well-known but I was still tempted to believe that the expectation would be
- ; same reason.
there’s a 1/4 chance we roll a 1 and advance to . Say it takes rolls from to reach . The square of the number of rolls we took is . We defined and we computed from earlier, so the expectation of is .

So now we have an equation for . If we do the same proceure at , we end up with this system:

which we solve to get . Our variance is therefore .

I think what’s neat about this approach is that we are building “layers” of equations on the Markov chain, where in order to solve for higher moments of we first need to write and solve the systems of equations for lower-order moments. This technique can be extended to calculate and so on, although dealing with the terms in higher binomial expansions quickly gets tedious.

Finally, if you trust code more than math, here’s some C++ code that simulates the experiment and gets 207.989 for the variance, with 1 billion trials:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <random>
using namespace std;
typedef long long ll;

// trials and # sides on die, respectively
const int N = 1e9, S = 4;
mt19937 mt(1509365);
std::uniform_int_distribution<int> dist(1, S);

int calc() {
	int prev = -1;
	for (int i = 1; true; i++) {
		int cur = dist(mt);
		if (cur == 2 && prev == 1) return i;
		prev = cur;
	}
}

int main() {
	ll sum = 0, sum2 = 0;
	for (int i = 0; i < N; i++) {
		int res = calc();
		sum += res;
		sum2 += res * res;
	}
	double mean = 1.0 * sum / N;
	// no need for Bessel's correction, N is huge
	double var = (1.0 * sum2 / N) - mean * mean;
	cout << mean << ' ' << var << '\n';
}