correlations between Spearman’s rho p distances for various p: a plethora of central limit theorems

In the past week or so I have been playing around with various metrics on {S_n}, as a continuation of the previous post. Most notably I looked at the Spearman’s {\rho_p} distance, which is the analogue of the footrule and defined as

\displaystyle  \rho_p(\sigma) = \sum_i |i - \sigma(i)|^p \ \ \ \ \ (22)

The task is to find their correlations. Once that’s done, the multivariate central limit theorem usually follows from the vector valued version of the Hoeffding’s combinatorial central limit theorem, which states

Theorem 10 Let {A= (a_{ij})} be an {n \times n} square matrix with {a_{i,.} = 0 = a_{.,j}} for all {i,j} where {a_{i,.}} stands for the ith row average {\frac{1}{n}\sum_j a_{ij}} and similarly for {a_{.,j}}. Also let {a_{.,.} = \frac{1}{n^2} \sum_{ij} a_{ij}}. For a permutation {\sigma \in S_n}, define the following generalized diagonal sum:

\displaystyle  f_A(\sigma) = \sum_{i=1}^n a_{i,\sigma(i)}. \ \ \ \ \ (23)

Then for {\sigma} uniformly chosen from {S_n}:

\displaystyle  \mu_A:=\mathop{\mathbb E} \sum_i a_{i,\sigma(i)} = \frac{1}{n} \sum_{i,j} a_{ij}\\ v_A^2:=\rm{var} \sum_i a_{i,\sigma(i)} = \frac{1}{n-1} \sum_{ij} d^2_{ij} \ \ \ \ \ (24)

where {d_{ij} = a_{ij} - a_{i,.} - a_{.,j} + a_{.,.}}. and the following central limit theorem holds

\displaystyle  \epsilon_A = \sup_t |\mathbb{P}[(f_A(\sigma) - \mu_A)/v_A < t] - \Phi(t)| \le K \sum_{ij} |d_{ij}|^3 / (n v_A^{3/2}) \ \ \ \ \ (25)

for some absolute constant {K}.

This result is initially due to Wassily Hoeffding, later improved by Motoo and finally proved in complete generality and with error bound by Bolthausen, using Stein’s method. We will visit Bolthausen’s argument in a later post.
In the case of the {\rho_p} statistics, {d_{ij}} are of order {n^p} whereas {v_A^2} is of order {n^{2p+1}}, hence the error term above is bounded by {O(n^{-1/2})}. Here the matrix {A_p} is given by {a_{ij} = |i-j|^p}.
Similarly we can take linear combinations of different {\rho_p}

\displaystyle  \tilde{\rho} = \sum_{p=1}^k c_p \tilde{f}_{A_p} \ \ \ \ \ (26)

where {\tilde{f}_{A_p}(\sigma) = \frac{f_{A_p} (\sigma) - \mu_{A_p}}{v_{A_p}}}. Then the above combinatorial CLT also applies to {\tilde{f}} and we basically establish the joint normality of {f_{A_p}} for any finite collection of {p}‘s.

It remains to compute the correlation between pairs of {f_{A_p}} and {f_{A_q}}, which are the same as {\rho_p} and {\rho_q}.

The computation mimics the calculation done for the inversion and footrule distance. We used mathematica to compute quantities of the form {\mathop{\mathbb E} \rho_p(\sigma) \rho_q(\sigma)}. The correlation is then given by

\displaystyle  \rm{corr} \rho_p \rho_q = \frac{\mathop{\mathbb E} \rho_p \rho_q -\mathop{\mathbb E} \rho_p \mathop{\mathbb E} \rho_q}{\sqrt{\mathop{\mathbb E} \rho_p^2 - (\mathop{\mathbb E} \rho_p)^2} \sqrt{\mathop{\mathbb E} \rho_q^2 - (\mathop{\mathbb E} \rho_q)^2}}. \ \ \ \ \ (27)

Below we tabulated a few mathematica assisted computation of correlations for various pairs of {p} and {q}‘s:

  • {\rm{corr} (\rho_1, \rho_2) = \frac{3}{\sqrt{10}} + o(1/n) \approx 0.948683}.
  • {\rm{corr} (\rho_1, \rho_3) = 15 \sqrt{\frac{5}{1547}} + o(1/n) \approx 0.852768}.
  • {\rm{corr} (\rho_2, \rho_3) = 27 \sqrt{\frac{2}{1547}} + o(1/n) \approx 0.970809}.
  • {\rm{corr} (\rho_1,\rho_6) = 13 \sqrt{\frac{11}{5294}} + o(1/n) \approx 0.592581}.

One observes that as {p} and {q} get further apart in ratio, their correlation goes to {0}. This is reminiscent to the phenomena of separation of scale I used to prove asymptotic independence of various quantities on {S_n}. I would like to thank James Zhao for simulation results that corrected my initial computation of the correlation between the footrule and the spearman’s {\rho_2} distance. Some further simulation suggests that the footrule and the longest increasing subsequence length (aka Ulam’s distance) are significantly correlated in the limit: it converges to something around {-0.3}. But if one looks at the following distance {\rho_0} defined by

\displaystyle  \rho_0(\sigma) = \sum_i 1_{i > \sigma(i)}. \ \ \ \ \ (28)

Then simulation suggests the correlation between Ulam’s distance and {\rho_0} is asymptotically very weak if not {0}. With {n=1000} I obtained an empirical correlation of {0.01} and with {n=10000} it drops to {0.003}. It will be interesting to prove that they are actually asymptotically independent. Note that Ulam’s distance does not converge to the standard normal upon linear transformation in the limit {n \rightarrow \infty}. So vanishing correlation does not imply that it is independent from {\rho_0}. One would also like to understand the underlying reason why Ulam’s distance is highly correlated with some quantities such as {\rho_1}, but independent with others, such as the the length of the longest decreasing subsequence of a permutation, a result that follows from the connection between permutations to Young diagrams under Plancherel measure via the RSK algorithm, and the latter with eigenvalues of Gaussian unitary ensemble, a result proved independently by Borodin, Okounkov, Olshanski and Johansson. See the article Asymptotics of Plancherel Measures for Symmetric groups by the first group.


About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s