Markov Chains, Beyond the Mean

Suppose we are interested in the number of moves it takes to move from an initial state to another given state in a Markov chain. Finding the expectation of the number of moves is simple, but how about its variance?

October 2, 2018

Here's an example of a typical Markov chain problem: say there's a four-sided die. You keep rolling this die until your current roll comes up with a 2 and your previous roll was a 1. Let R equal your number of rolls. What is E[R] ?

This problem can be modeled as a Markov chain with 3 states:

Markov chain

In this Markov chain, s_2 corresponds to being finished (your past two rolls were a 1 then a 2, in that order); s_1 corresponds to being partly done (you just rolled a 1 and are now hoping for a 2), and s_0 corresponds to no progress (still hoping to roll a 1). There's a 1/4 chance of transitioning from s_0 to s_1 because there's a 1/4 chance of rolling a 1; there's a 1/2 chance of transitioning from s_1 back to s_0 because rolling a 3 or 4 will get you back to nothing; and so on.

Calculating E[R] is relatively straightforward. Let a_i be the expected number of rolls needed to reach s_2 given we're at s_i . We already know a_2=0 , and we can calculate the other a_i through some algebra. For example, if we're currently at s_0 :

  • there's a 3/4 probability we roll a 2, 3, or 4, in which case we stay in s_0 . In this case, our expected number of rolls to get to s_2 is a_0+1 : 1 for the roll we just did and another a_0 to get from s_0 to s_2 .
  • there's a 1/4 probability we roll a 1. In this case, we'd expect to take y+1 rolls to get to s_2 ; 1 for the roll we just did and y more once we're at s_1 .

By the definition of expectation, a_0 is average of the number of rolls expected in each case, weighted by the probability of that case. We can write a similar equation for when we're in s_1 , and get this system:

a_0=\frac{3}{4}(a_0+1) + \frac{1}{4}(a_1+1)

a_1=\frac{1}{2}(a_0+1) + \frac{1}{4}(a_1+1) + \frac{1}{4}(a_2+1)

which we solve and get a_0=16, a_1=12 .

A trickier problem that can actually be solved with the same approach is calculating \operatorname{var} R . Since we already know E[R]=16 and \operatorname{var} R=E[R^2]-E[R]^2 the challenge becomes calculating E[R^2] . The non-linearity of R^2 makes this difficult. Whenever we roll the die, the number of rolls we've made increases by one (duh). The square of the number of rolls we've made, however, increases by 2x+1 , where x is the number of rolls we've made before that one. In some sense, R^2 is "stateful" because a given roll's impact on R^2 depends on past rolls.

Through the magic of linearity of expectation, though, things turn out fine. Analagous to what we did to find E[R] , let b_i equal the expectation of the square of the number of rolls to get from s_i to s_2 . We know b_2=0 and we want to find b_0 , which we can do with some algebra. Assuming we're at s_0 :

  • there's a 3/4 chance we roll a 2, 3, or 4 and stay in s_0 . Say we take m rolls after this one to reach s_2 . Then the square of the number of rolls we took is (m+1)^2=m^2+2m+1 . On expectation, this is E[m^2]+2E[m]+1 . By definition, E[m^2]=b_0 and we computed earlier that E[m]=a_0=16 so our expression simplifies to b_0+33 . Note what this does not equal:
    • (E[m+1])^2 , because in general E[x^p]\neq E[x]^p . This is pretty well-known but I was still tempted to believe that the expectation would be (16+1)^2=289
    • (\sqrt{E[m^2]}+1)^2 ; same reason.
  • there's a 1/4 chance we roll a 1 and advance to s_1 . Say it takes n rolls from s_1 to reach s_2 . The square of the number of rolls we took is (n+1)^2 . We defined E[n^2]=b_1 and we computed E[n]=a_1=12 from earlier, so the expectation of (n+1)^2 is b_1+25 .

So now we have an equation for b_0 . If we do the same proceure at s_1 , we end up with this system:

b_0=\frac{3}{4}(b_0+33)+\frac{1}{4}(b_1+25)

b_1=\frac{1}{4}(b_0+33)+\frac{1}{4}(b_1+25)+\frac{1}{4}(b_2+1)

which we solve to get b_0=464 . Our variance is therefore E[R^2]-E[R]^2=b_0-a_0^2=208 .

I think what's neat about this approach is that we are building "layers" of equations on the Markov chain, where in order to solve for higher moments of R we first need to write and solve the systems of equations for lower-order moments. This technique can be extended to calculate E[R^3], E[R^4] and so on, although dealing with the terms in higher binomial expansions quickly gets tedious.

Finally, if you trust code more than math, here's some C++ code that simulates the experiment and gets 207.989 for the variance, with 1 billion trials:

#include <iostream>
#include <random>
using namespace std;
typedef long long ll;

// trials and # sides on die, respectively
const int N = 1e9, S = 4;
mt19937 mt(1509365);
std::uniform_int_distribution<int> dist(1, S);

int calc() {
	int prev = -1;
	for (int i = 1; true; i++) {
		int cur = dist(mt);
		if (cur == 2 && prev == 1) return i;
		prev = cur;
	}
}

int main() {
	ll sum = 0, sum2 = 0;
	for (int i = 0; i < N; i++) {
		int res = calc();
		sum += res;
		sum2 += res * res;
	}
	double mean = 1.0 * sum / N;
	// no need for Bessel's correction, N is huge
	double var = (1.0 * sum2 / N) - mean * mean;
	cout << mean << ' ' << var << '\n';
}