## A wrong way of doing coupling in Markov chain mixing

Coupling is a powerful method to bound total variation mixing time of ergodic markov chains. The key equality is
$\|\mu_0 P_t - \pi\|_{TV} = \inf_{\{X_t,Y_t \text{ coupling of } \mu_0 P_t, \pi P_t\}} \mathbb{P}[X_t \neq Y_t]$.

Here the coupling processes $X_t, Y_t$ are non-Markovian in general. Since the above is equality, coupling bounds can be more effective than $\mathcal{L}^2$, separation distance (stationary time), or entropy methods, though they typically require good symmetry property on the problem at hand.

In Yuval Peres and Allan Sly’s recent article, mixing of the upper triangular matrix walk, which reads like a 10 page Putnam problem solution, they used the coupling “definition” to bound the variation distance of an upper triangular matrix walk over $\mathbb{F}_p$. I will assume the readers familiar with that paper. The essential idea is that one can write the state at time T of the chain as
$X_T = Y_T + \sum_{k=1}^{J(T)} a_k W_k$, where $W_k$‘s are all matrices with 0 everywhere except the last column. Since the $a_k$‘s are iid uniform in $\mathbb{F}_p$, if the $W_k$‘s span the last column space, $X_T$ would have mixed up its last column, $Y_T$ on the other hand is a walk on the first $n-1$ columns, hence is like a transposed original walk with one dimension less. So naively one expects to condition on the event $\mathcal{A}(T,n)$ that $W_k$‘s span, and then wait til the smaller chain with dimension $n-1$ to mix. This argument however is not valid, as the following example shows:
Say we only have two columns, populated initially by $e_n$ and $e_n$. Then clearly at any time, the two columns are perfectly correlated, so there is no chance the state consists of iid $\mathbb{F}_p$ entries apart from the bottom row. The issue with the above argument is then that mixing the marginal distribution on column 1 does not decouple it from column 2, which is already mixed at the beginning. This false reasoning also plagued my initial attempt to prove polynomial mixing of the Kac walk for $SO(n)$, wherein I have established polynomial mixing for one column, a copy of $S^{n-1}$. However, conditional on the mixing of column one, I cannot proceed to mix column 2 in the same amount of time, precisely because I cannot decouple the first two columns in $SO(n)$, which amounts to showing the Kac moves mixes a vector in the complementary subsphere $S^{n-2}$ of a random vector, which is also wielded by the same Kac moves at each subsequent time.

The legit coupling argument as in Peres and Sly’s paper, proceeds as following: since the $n$-dimensional walk $X_T$ restricted to the first $n-1$ columns is Markov as well, one can couple the restricted state with the corresponding equilibrium measure $\pi$ except with probability $d_{n-1}(T)$ at time $T$. Procedurally, one matches the iid uniform $\mathbb{F}_p$ entries of the first $n-1$ columns of the equilibrium state $Y_T$ to the time $T$ state of the $n-1$ dimensional chain $X_T|_{[n-1]}$ except for probability $d_{n-1}(T)$, so it doesn’t affect $X_T$ at all. Next under the event $\mathcal{A}(T,n)$, the last column of $X_T$ consists of iid uniform $\mathbb{F}_p$ entries except the last entry, and independent from the first $n-1$ columns, so one can construct the last column of the equilibrated state $Y_T$ to match that of the time $T$ chain, in such as a way that it’s independent of the first $n-1$ columns of both $X_T$ and $Y_T$ already constructed. Here the independence of the last column of $X_T$ from the other columns, as guaranteed by the spanning condition $\mathcal{A}(T,n)$, is extremely important. So is the independence of last column from rest under equilibrium measure (note Kac model on $SO(n)$ is not so).