Suppose we sample pairs of values from some bivariate distribution. We would like to understand the correlation between the two random variables represented by the first and second coordinates respectively. One way to do this is of course to compute the empirical correlation (Pearson’s correlation coefficient) given by

One disadvantage of this test statistics is that it’s parametric in the sense that for and to have perfect correlation (or perfect negative correlation), they must be related by a linear function. Thus it’s based on linear regression model. To remove such parametric dependence, one instead convert the values into their rank statistics, i.e., assuming there is no repeated value among ‘s, then there exists a such that

and similarly for ‘s. This procedure essentially removes any parametric dependence. One could then form various statistics based on the rank data ‘s and ‘s, instead of and . For instance, one could again use the ranked version of Pearson’s coefficient

This will give perfect correlation provided is monotone in , or . Alternative rank correlation statistics include

- Spearman’s footrule: ,
- Spearman’s footrule: . These are discrete analogues of distances between two functions. When , it’s called Spearman’s rho.
- Kendall’s tau: . This can be summarized as number of corcordance pairs of minus the number of discordant pairs.

Kendall’s tau has the alternative formulation as the minimum number of adjacent transpositions needed to bring to , under the linear graph structure on . One could generalize it to other graph structures, such as the cycle graph , although its statistical interpretation is less clear.

Henry Daniels was perhaps the first to show that both Spearman’s and Kendall’s satisfy a central limit theorem under linear transformation. His argument was based on a moment calculation of the following two general statistics

and

where resp. is a singly resp. doubly indexed square matrix. In the case of Spearman’s rho, we have , whose normality can be proved using the classical Hoeffding’s combinatorial central limit theorem (see also Persi’s book on representation theory in probability and statistics). Daniels represented slightly differently using the following *doubly* indexed matrix b_{(i,j),(k,l)} = (i-j)(k-l),

One sees that this is the same as because

The reason Daniels chose the underlying matrix in the form (1) is two-fold:

- He wants to relate to , so it’s natural to have doubly indexed rather than singly indexed matrix
- has column and row means zero; note that has that property as well. (Warning, does not have zero column and row means, but it’s tempting to define in terms of that).

For Kendall’s , we have . Note this is not centered. But it’s convenient because there are many zeros in the matrix. This can be taken care of using a generalization of Hoeffding’s theorem to doubly indexed matrix, as developed initially by Henry Daniels and later in tighter conditions by Zhao, Bai, Chao, and Liang, along with error bounds, using Stein’s method. The latter article is heavily based on Bolthausen’s similar results for the singly indexed case. Diaconis and Graham (1979) also derived the above two CLT results independently, using the machinery of combinatorial central limit theorem. In addition, they proved a deterministic inequality relating and (which are of the same order). To see that and are perfectly correlated, one simply has to compute their covariance and individual variances:

Before proceeding, we record that

As a warm-up, we compute the mean of first; by inspection.

Where we used that the number of permutations with two coordinates prescribed is .

Next we compute (this can be found in Diaconis and Graham also):

The above subdivision is necessary because for a given number of free variables, for example in the case of , the number of permutations satisfying the the constraint , , , and is (so in the above example x must equal z). Because we need to subtract off later, we need to keep track of not only the highest order term, but also the second highest order. By doing the appropriate counting, one gets for example

Similarly, we get for the other terms

and

Therefore we have

For the variance of , we use the result from Diaconis’ book, where for ,

Since

we easily get . Actually this is not what I got by computing directly. I obtained , which is the desired answer: if we write

Then I got

Therefore

Finally for reader’s benefit, we compute the covariance of and as well: again we have the following decomposition according to the number of fixed indices:

where the last equality defines the ‘s in the obvious typographical manner. By symmetry,

Similarly,

But,

and

Together, we have

Note that since has mean zero, . is in fact the covariance of and . So we can now compute the correlation:

Hence and are perfectly correlated asymptotically as , which is somewhat unexpected since and have different scaling. Note that in Henry Daniels’ paper, his definition of is slightly different from ours. He defined it in terms of the following matrix , where

where according as , , or , and similarly for . Thus the diagonal terms do not match with our definition. Nonetheless the diagonal terms have a lower order contribution to the expectation in the end, thus the perfect correlation result above also holds in Daniels’ case.