## History of limit theorems for rank correlations and some tedious computations

Suppose we sample pairs of values ${(X_i, Y_i)}$ from some bivariate distribution. We would like to understand the correlation between the two random variables represented by the first and second coordinates respectively. One way to do this is of course to compute the empirical correlation (Pearson’s correlation coefficient) given by

$\displaystyle (\sum_i X_i Y_i - [\sum_i X_i][ \sum_i Y_i])/([\sum_i X_i^2][\sum_i Y_i^2])^{1/2} \ \ \ \ \ (1)$

One disadvantage of this test statistics is that it’s parametric in the sense that for ${X}$ and ${Y}$ to have perfect correlation (or perfect negative correlation), they must be related by a linear function. Thus it’s based on linear regression model. To remove such parametric dependence, one instead convert the values ${X_i}$ into their rank statistics, i.e., assuming there is no repeated value among ${X_i}$‘s, then there exists a ${\sigma \in S_n}$ such that

$\displaystyle X_{\sigma(1)} < X_{\sigma(2)} < \ldots < X_{\sigma(n)} \ \ \ \ \ (2)$

and similarly for ${Y_i}$‘s. This procedure essentially removes any parametric dependence. One could then form various statistics based on the rank data ${\sigma_X(i)}$‘s and ${\sigma_Y(i)}$‘s, instead of ${X_i}$ and ${Y_i}$. For instance, one could again use the ranked version of Pearson’s coefficient

$\displaystyle (\sum_i \sigma_X(i) \sigma_Y(i) - [\sum_i \sigma_X(i)][\sum_i \sigma_Y(i)])/[\sum_i i^2]. \ \ \ \ \ (3)$

This will give perfect correlation provided ${X_i}$ is monotone in ${Y_i}$, or ${\sigma_X = \sigma_Y}$. Alternative rank correlation statistics include

1. Spearman’s footrule: ${\rho(X,Y) = \sum_{i=1}^n |\sigma_X(i) - \sigma_Y(i)|}$,
2. ${\mathcal{L}^p}$ Spearman’s footrule: ${\rho_p(X,Y) = \sum_{i=1}^n |\sigma_X(i) -\sigma_Y(i)|^p}$. These are discrete analogues of ${\mathcal{L}^p}$ distances between two functions. When ${p = 2}$, it’s called Spearman’s rho.
3. Kendall’s tau: ${\tau(X,Y) = \sum_{i < j} \delta_{\sigma_X(i) > \sigma_X(j)}}$. This can be summarized as number of corcordance pairs of ${i,j}$ minus the number of discordant pairs.

Kendall’s tau has the alternative formulation as the minimum number of adjacent transpositions needed to bring ${\sigma_X}$ to ${\sigma_Y}$, under the linear graph structure on ${\{1, \ldots, n\}}$. One could generalize it to other graph structures, such as the cycle graph ${{\mathbb Z} / n{\mathbb Z}}$, although its statistical interpretation is less clear.
Henry Daniels was perhaps the first to show that both Spearman’s ${\rho}$ and Kendall’s ${\tau}$ satisfy a central limit theorem under linear transformation. His argument was based on a moment calculation of the following two general statistics

$\displaystyle \sum_i a_{i,\sigma(i)} \ \ \ \ \ (4)$

and

$\displaystyle \sum_{i < j} a_{(i,j),(\sigma(i),\sigma(j))} \ \ \ \ \ (5)$

where ${A = (a_{i,j})}$ resp. ${A = (a_{(i,j),(k,l)})}$ is a singly resp. doubly indexed square matrix. In the case of Spearman’s rho, we have ${a_{i,j} = |j-i|^2}$, whose normality can be proved using the classical Hoeffding’s combinatorial central limit theorem (see also Persi’s book on representation theory in probability and statistics). Daniels represented ${\rho_2}$ slightly differently using the following doubly indexed matrix b_{(i,j),(k,l)} = (i-j)(k-l),

One sees that this is the same as ${-A_{\rho_2}}$ because

$\displaystyle \sum_{i< j} b_{(i,j),(\sigma(i),\sigma(j))} = -2\sum_{i,j \in [n]} ij + 2\sum_i i \sigma(i)\\ =\rm{const}_n - \sum_i (i - \sigma(i))^2. \ \ \ \ \ (6)$

The reason Daniels chose the underlying matrix ${B}$ in the form (1) is two-fold:

1. He wants to relate ${\rho_2}$ to ${\tau}$, so it’s natural to have doubly indexed rather than singly indexed matrix
2. ${B}$ has column and row means zero; note that ${A}$ has that property as well. (Warning, ${b'_{(i,j),(k,l)} := (i-k)(j-l)}$ does not have zero column and row means, but it’s tempting to define ${\rho_2}$ in terms of that).

For Kendall’s ${\tau}$, we have ${a_{(i,j),(k,l)} = \delta_{i < j, k > l}}$. Note this is not centered. But it’s convenient because there are many zeros in the matrix. This can be taken care of using a generalization of Hoeffding’s theorem to doubly indexed matrix, as developed initially by Henry Daniels and later in tighter conditions by Zhao, Bai, Chao, and Liang, along with error bounds, using Stein’s method. The latter article is heavily based on Bolthausen’s similar results for the singly indexed case. Diaconis and Graham (1979) also derived the above two CLT results independently, using the machinery of combinatorial central limit theorem. In addition, they proved a deterministic inequality relating ${\tau}$ and ${\rho_1}$ (which are of the same order). To see that ${\tau(\sigma):=\tau(\rm{id}, \sigma)}$ and ${\rho_2(\sigma):=\rho_2(\rm{id},\sigma)}$ are perfectly correlated, one simply has to compute their covariance and individual variances:

$\displaystyle \frac{1}{n!} \sum_\sigma \sum_{i \sigma(j)}] \ \ \ \ \ (7)$

Before proceeding, we record that

$\displaystyle \sum_{k > l} (k-l)\approx n^3/6. \ \ \ \ \ (8)$

As a warm-up, we compute the mean of ${\tau}$ first; ${\mathop{\mathbb E} \rho_2 = 0}$ by inspection.

$\displaystyle \mathop{\mathbb E} \tau = \frac{1}{n!} \sum_\sigma \tau(\sigma)\\ =\frac{1}{n!} \sum_\sigma \sum_{i < j} 1_{\sigma(i) > \sigma(j)}\\ = \frac{1}{n!} \sum_{i < j} \sum_{k > l} (n-2)!\\ = \binom{n}{2}^2 \frac{(n-2)!}{n!}\\ = \frac{n(n-1)}{4}. \ \ \ \ \ (9)$

Where we used that the number of permutations with two coordinates prescribed is ${(n-2)!}$.
Next we compute ${\mathop{\mathbb E} \tau^2}$ (this can be found in Diaconis and Graham also):

$\displaystyle \mathop{\mathbb E} \tau^2 = \frac{1}{n!} \sum_\sigma [\sum_{i < j} 1_{\sigma(i) > \sigma(j)}]^2\\ = \frac{1}{n!} \sum_\sigma \sum_{i < j;k < l} 1_{\sigma(i) > \sigma(j)} 1_{\sigma(k) > \sigma(l)} \\ = \frac{1}{n!} \sum_\sigma \sum_{i \sigma(j)} 1_{\sigma(k) > \sigma(l)} \ \ \ \ \ (10)$

The above subdivision is necessary because for a given number ${m}$ of free variables, for example ${m=3}$ in the case of ${i=k, j \neq l}$, the number of permutations satisfying the the constraint ${\sigma(i)=x}$, ${\sigma(j) = y}$, ${\sigma(k) = z}$, and ${\sigma(l) = w}$ is ${(n-k)!}$ (so in the above example x must equal z). Because we need to subtract off ${[\mathop{\mathbb E} \tau]^2}$ later, we need to keep track of not only the highest order term, but also the second highest order. By doing the appropriate counting, one gets for example

$\displaystyle \frac{1}{n!} \sum_\sigma \sum_{i \sigma(j)} 1_{\sigma(k) > \sigma(l)} = \frac{1}{n!} \sum_{i y,z > w}] 1_{\{x,y\} \cap \{z,w\} = \emptyset} \\ = \frac{(n-4)!}{n!} [\frac{1}{4} \frac{n!}{(n-4)!}]^2 \\ = \frac{1}{16} \frac{n!}{(n-4)!} = \frac{1}{16} [ n^4 - 6n^3 + o(n^3)] \ \ \ \ \ (11)$

Similarly, we get for the other terms

$\displaystyle \frac{1}{n!} \sum_\sigma \sum_{i \sigma(j)} 1_{\sigma(k) > \sigma(l)} = 2 \frac{1}{n!} \sum_{i y, z>w} 1_{x=z,y \neq w} \\ = \frac{2}{9} n^3 + o(n^3), \ \ \ \ \ (12)$

$\displaystyle \frac{1}{n!} \sum_\sigma \sum_{i \sigma(j)} 1_{\sigma(k) > \sigma(l)} = \frac{1}{18} n^3 + o(n^3), \ \ \ \ \ (13)$

and

$\displaystyle \frac{1}{n!} \sum_\sigma [\sum_{i \sigma(j)} 1_{\sigma(k) > \sigma(l)} = O(n^2) = o(n^3). \ \ \ \ \ (14)$

Therefore we have

$\displaystyle \rm{var} \tau = \mathop{\mathbb E} \tau^2 - [\mathop{\mathbb E} \tau]^2 = [-3/8 + 2/9 + 1/18 + 1/8] n^3 + o(n^3)= \frac{1}{36} n^3 + o(n^3). \ \ \ \ \ (15)$

For the variance of ${\rho_2}$, we use the result from Diaconis’ book, where for ${S^2(\sigma) = \sum_i (i - \sigma(i))^2}$,

$\displaystyle \rm{var} S^2 = \frac{n^5}{36} + o(n^5). \ \ \ \ \ (16)$

Since

$\displaystyle \rho_2(\sigma) = \frac{1}{2} [ 2 n\sum_i i\sigma(i) - 2 \sum_{i,j} ij] = -\frac{n}{2} S^2(\sigma) + \rm{const} \ \ \ \ \ (17)$

we easily get ${\rm{var} \rho_2 = \frac{n^7}{144}}$. Actually this is not what I got by computing ${\rm{var} \rho_2}$ directly. I obtained ${ \frac{n^7}{36}}$, which is the desired answer: if we write

$\displaystyle \frac{1}{n!} \sum_\sigma (\sum_{i < j} (i-j) (\sigma(i) - \sigma(j))^2 \\ = \frac{1}{n!} \sum_\sigma \sum_{i

Then I got

$\displaystyle s_1 = 0\\ s_4 = O(n^6)\\ s_2 = \frac{n^7}{30} + o(n^7)\\ s_3 = -\frac{n^7}{180} + o(n^7) \ \ \ \ \ (19)$

Therefore

$\displaystyle \mathop{\mathbb E} \rho_2^2 = [1/30 - 1/180]n^7 + o(n^7) = n^7/36 + o(n^7) \ \ \ \ \ (20)$

Finally for reader’s benefit, we compute the covariance of ${\rho_2}$ and ${\tau}$ as well: again we have the following decomposition according to the number of fixed indices:

$\displaystyle \mathop{\mathbb E} \tau \rho_2 = \frac{1}{n!} \sum_\sigma \sum_{i < j, k < l} [ 1_{\{i,j\} \cap \{k, l\} = \emptyset} + 2 1_{i = k, j \neq l} + 2 1_{i = l, j \neq k} + 1_{i = k, j = l}] 1_{\sigma(i) > \sigma(j)} (k-l)[\sigma(k) - \sigma(l)]\\ =: s_1 + s_2 + s_3 + s_4 \ \ \ \ \ (21)$

where the last equality defines the ${s_j}$‘s in the obvious typographical manner. By symmetry,

$\displaystyle s_1 = \frac{(n-4)!}{n!} \sum_{k < l} \binom{n-2}{2} (k -l) \sum_{x > y, z \neq w, z,w \notin \{x,y\}} (z-w) = 0. \ \ \ \ \ (22)$

Similarly,

$\displaystyle s_4 = \frac{1}{n!} \sum_{i < j} \sum_{x > y} (i-j)(x-y) (n-2)!\\ = -\frac{n^4}{36} + o(n^4) = o(n^5). \ \ \ \ \ (23)$

But,

$\displaystyle s_3 = \frac{1}{n!} \sum_{i < j, k< l} 1_{i = l, j \neq k} \sum_{x > y, z \notin \{x,y\}} (k-l) (z-y) (n-3)!\\ =\frac{(n-3)!}{n!} \sum_{k < l} (n-l)(k-l) \sum_{x > y} [\binom{n}{2} - x -y -(n-2)y] \\ =-\frac{n^5}{288} + o(n^5) \ \ \ \ \ (24)$

and

$\displaystyle s_2 = \frac{1}{n!} \sum_{i < j, k < l} 1_{i =k, j \neq l} \sum_{x > y, w \notin \{x,y\}} (k-l)(x-w)(n-3)!\\ = \frac{(n-3)!}{n!} \sum_{k < l} (n-k-1)(k-l) [(n-1) \sum_{y=1}^{n-1} \sum_{x = y+1}^n x - \binom{n}{2}^2 + \sum_{y=1}^{n-1} \sum_{x = y+1}^n y]\\ =-\frac{n^5}{96} + o(n^5). \ \ \ \ \ (25)$

Together, we have

$\displaystyle \mathop{\mathbb E} \rho_2 \tau = 2 (s_2 + s_3) + o(n^5) = \frac{n^5}{36} + o(n^5) \ \ \ \ \ (26)$

Note that since ${\rho_2}$ has mean zero, ${\mathop{\mathbb E} [\rho_2 \mathop{\mathbb E} \tau] = 0}$. ${\mathop{\mathbb E} \tau \rho_2}$ is in fact the covariance of ${\tau}$ and ${\rho_2}$. So we can now compute the correlation:

$\displaystyle \rm{corr}(\tau,\rho_2) = \frac{\mathop{\mathbb E} \tau \rho_2}{\sqrt{\rm{var} \tau \rm{var} \rho_2}} \\ = \frac{n^5/36}{\sqrt{n^3/36 + o(n^3)} \sqrt{n^7 /36 + o(n^7)}}\\ = 1 + o(1) \ \ \ \ \ (27)$

Hence ${\tau}$ and ${\rho_2}$ are perfectly correlated asymptotically as ${n \rightarrow \infty}$, which is somewhat unexpected since ${\tau}$ and ${\rho_2}$ have different scaling. Note that in Henry Daniels’ paper, his definition of ${\tau}$ is slightly different from ours. He defined it in terms of the following matrix ${D}$, where

$\displaystyle d_{ij,kl} = x_{ij} y_{kl} \ \ \ \ \ (28)$

where ${x_{ij} = \pm 1, 0}$ according as ${i < j}$, ${i > j}$, or ${i=j}$, and similarly for ${y_{kl}}$. Thus the diagonal terms do not match with our definition. Nonetheless the diagonal terms have a lower order contribution to the expectation in the end, thus the perfect correlation result above also holds in Daniels’ case.