Coupling is a powerful method to bound total variation mixing time of ergodic markov chains. The key equality is
Here the coupling processes are non-Markovian in general. Since the above is equality, coupling bounds can be more effective than , separation distance (stationary time), or entropy methods, though they typically require good symmetry property on the problem at hand.
In Yuval Peres and Allan Sly’s recent article, mixing of the upper triangular matrix walk, which reads like a 10 page Putnam problem solution, they used the coupling “definition” to bound the variation distance of an upper triangular matrix walk over . I will assume the readers familiar with that paper. The essential idea is that one can write the state at time T of the chain as
, where ‘s are all matrices with 0 everywhere except the last column. Since the ‘s are iid uniform in , if the ‘s span the last column space, would have mixed up its last column, on the other hand is a walk on the first columns, hence is like a transposed original walk with one dimension less. So naively one expects to condition on the event that ‘s span, and then wait til the smaller chain with dimension to mix. This argument however is not valid, as the following example shows:
Say we only have two columns, populated initially by and . Then clearly at any time, the two columns are perfectly correlated, so there is no chance the state consists of iid entries apart from the bottom row. The issue with the above argument is then that mixing the marginal distribution on column 1 does not decouple it from column 2, which is already mixed at the beginning. This false reasoning also plagued my initial attempt to prove polynomial mixing of the Kac walk for , wherein I have established polynomial mixing for one column, a copy of . However, conditional on the mixing of column one, I cannot proceed to mix column 2 in the same amount of time, precisely because I cannot decouple the first two columns in , which amounts to showing the Kac moves mixes a vector in the complementary subsphere of a random vector, which is also wielded by the same Kac moves at each subsequent time.
The legit coupling argument as in Peres and Sly’s paper, proceeds as following: since the -dimensional walk restricted to the first columns is Markov as well, one can couple the restricted state with the corresponding equilibrium measure except with probability at time . Procedurally, one matches the iid uniform entries of the first columns of the equilibrium state to the time state of the dimensional chain except for probability , so it doesn’t affect at all. Next under the event , the last column of consists of iid uniform entries except the last entry, and independent from the first columns, so one can construct the last column of the equilibrated state to match that of the time chain, in such as a way that it’s independent of the first columns of both and already constructed. Here the independence of the last column of from the other columns, as guaranteed by the spanning condition , is extremely important. So is the independence of last column from rest under equilibrium measure (note Kac model on is not so).