## an extension of a Markov chain identity

I revisited some elementary finite state Markov chain theory these days using the Aldous and Fill book, and found some treasures even in chapter 2. I was not interested in them before because they seem not directly related to mixing time problems. Now that I am free from the shackle of graduate school, the time is ripe to figure things out systematically on my own. One of those identities shows a way to compute $P_i(T_j < T_k)$ in terms of rational functions of $E_i T_j$ for various $i, j$'s; here $T_j$ stands for the first hitting time of the state $j$, and $E_i T_j$ is the expected first hitting time of $j$ if one starts the chain at $i$. So I sought generalization to things like $\star = P_i(T_j < T_k < T_\ell)$. Note since we are working in discrete time, these inequalities are clean as equality cannot hold. To investigate such question conforms to my view of how math should be done in a way: a systematic pursuit of the obvious possibility to ease laymen's way of grasping the subject; indeed most published works explore things hardly conceived by the untrained crowd, albeit being very original and nontrivial.
So the question remains whether the three term inequality probability can be expressed as a rational function of things like $E_i T_j$. After trying a bunch of things similar to how the simpler two term case was done, I decided that it was inevitable to introduce things of the form $E_i T_j \wedge T_k$ into the equation. This is clearly the same as $E_i T_{\{j,k\}}$, i.e., the mean hitting time of the two point set $\{j,k\}$. It is unclear to me yet whether the latter can be expressed in terms of $E_i T_j$‘s only. First we apply the Markov property to get $\star = P_i(T_j < T_k \wedge T_\ell) P_j(T_k < T_\ell)$. The second factor is already handled by Aldous and Fill and turns out to rely on the following identities:
$E_\theta(\text{number of visits to j before } S) = \pi_j E_\theta S$ where $\theta$ is a random state equal in law to $X_S$, and $S$ is a stopping time. The convention is that the number of visits before something includes time 0 but excludes $S$.
$E_j(\text{number of visits to j before }T_j^+) = 1$ (obvious). Here $T_j^+$ means the first time $t > 0$ such that $X_t = j$.
$E_j T_j^+ = 1/\pi_j$ follows from previous identity.
$E_j(\text{number of visits to k before } T_j^+) = \pi_k / \pi_j$.
$E_j(\text{number of visits to j before } T_k) = \pi_j(E_j T_k + E_k T_j)$

$E_j( \text{number of visits to k before } T_\ell) = \pi_j (E_j T_\ell + E_\ell T_k + E_k T_j - E_k T_j - E_j T_k)$

The first identity can of course be found in most textbooks on Markov chains, as it constructs the stationary distribution; the proof exploits an infinite sum and is a bit miraculous. The last two follow by choosing appropriate stopping time $S$. The accompanying diagrams illustrate the construction of appropriate stopping times to fit in the pattern of the first identity. Two not so immediate corollaries are

$\displaystyle P_j (T_k < T_j^+) = E_k(\text{number of visits to k before } T_j) / E_j(\text{number of visits to k before } T_j^+)\\ = \frac{E_k T_j + E_j T_k}{E_j T_j^+}$

$\displaystyle P_j(T_k < T_\ell) = E_k(\text{number of visits to k before } T_\ell) / E_j(\text{number of visits to k before } T_\ell) \\ = \frac{E_k T_j + E_j T_k}{E_j T_\ell + E_\ell T_k - E_j T_k}$

Indeed take the first of these: let $N_k(T_j) =$ the number of visits to $k$ before $T_j$. Then
$E_j(N_k(T_j^+)|T_k T_j^+) = 0$
hence
$E_j(N_k(T_j^+)) = E_k(N_k(T_j)) P_j(T_k < T_j^+)$
from which the first corollary is obvious.

Now consider the extended probability introduced in the beginning: $P_i(T_j < T_k < T_\ell)$. We first compute $P_i (T_j < T_{\{k,\ell\}})$. By analogy with above, this is

$E_k N_j(T_{\{k,\ell\}}^+) / E_i N_j(T_{\{k,\ell\}})$.

The numerator expands to

$\displaystyle \pi_j (E_k T_{k,\ell} + P_k(T_k^+ T_\ell) E_\ell T_j + E_j T_k - E_j T_k - E_k T_j) \\ = \pi_j (E_k T_{k,\ell} + P_k(T_k^+ T_\ell) E_\ell T_j - E_k T_j)$.

The denominator similarly becomes

$\pi_j (E_i T_{k,\ell} + P_i(T_k T_\ell) E_\ell T_j - E_i T_j)$.

Of course the $P_k(T_k^+ > T_\ell)$ and the like can be further expanded in terms of $E_i T_j$‘s, but we omit. The main obstacle is apparently $E_i T_{k, \ell}$.