## a short digest of tao and vu’s result on determinant of a random bernoulli matrix

Below I summarize the proof of the following result of Tao and Vu, which is not the optimal to date, but is conceptually perhaps the clearest:
Given an n by n matrix of iid entries consisting of bernoulli random variables, i.e., X with P(X=1) = P(X=-1) = 1/2. Then we want to understand the distribution of its determinant. One would expect a significant probability that the determinant is 0, which happens as soon as two rows are linearly dependent. But in fact the conjectured probability is
P(det = 0 ) = (1/2 + o(1))^n

So far the best result is obtained by Phillips Wood, where 1/2 is replaced by 1/sqrt{2}.
The first exponential decay result is obtained by Szemeredi Komlos et al, which states
P(det = 0) < 0.999^n
Tao and Vu managed to reduce the constant to 0.958, which is 1 – epsilon, where
epsilon is the unique solution in the range (0,1/2) to the equation
h(x) + x/ log_2 16/15 = 1
where h(x) = – x log x – (1-x) log(1-x)

h(x) comes up as the asymptotic formula for (n choose xn)
and 0.958 comes up as 2^{-epsilon}

We will not even sketch the proof, which is pretty complicated, but rather point out some interesting steps in the proof that enables us to destroy the symmetry of the problem and tap deeply into it.

First interesting idea is the so-called combinatorial dimension of a subspace V in F_2^n. It’s defined to be the number d such that P(X in V) is between 2^{-d – 1/n} and 2 ^{-d}, where X is a bernoulli random n-vector, i.e., with iid components with Bernoulli distributions. Really it’s a classification scheme for V based on the number of lattice points in F_2 it contains. We are also viewing F_2^n as imbedded in R^n in the lattice {-1, +1}^n. We hold on to this classification for later use.

Next one can show that in order for n vectors X_1, ldots, X_n to be linearly dependent, with high probability it’s enough to look at the case when they span a hyperplane in R^n, namely codimension 1 subspace. This is not too hard to prove using basic combinatorics. In the case the combinatorial dimension is large, say d > n epsilon, the probability that X_1, ldots, X_n span any hyperplane of combinatorial dimension d can be bounded by 2^{-d} < 2^{-n epsilon}, which is the claimed bound in the final result.

So one just has to focus on the case when the combinatorial dimension is small: d < n epsilon. It turns out one has to leave a o(1) term in the bound, so it’s really d < n (epsilon  – o(1)).

There the proof gets more complicated and the main idea is to do a comparison with a different random vector X’, defined to be with iid components of lazy Bernoulli distribution, i.e., with probability 15/16, it’s 0, the 1/32 probability of being +1 and same of being -1. The reason to introduce such an artificial construct is that one can show X’ is much more likely to lie on V than X.

To get quantitative result on P(X in V) or P(X’ in V), one uses Fourier analysis and the formula that

P(X in V) = E (int_0^1 e^{2pi i xi  X dot nu} d xi,

where nu is some vector with integer components perpendicular to the hyperplane V. This formula is just like how to recover probability information from Laplace inversion, proved using orthogonality relation of the characters.

It turns out that this Fourier inversion quantity can be bounded by
int_0^1 prod_{j=1}^n |cos (pi xi nu_j)| dxi

and similarly one can bound
P(X’ in V) le int_0^1 prod_{j=1}^n ((1-15/16) + 15/16 cos (2pi xi nu_j)) dxi

The end goal is to show that P(X’ in V) > 2 P(X in V), with an error of o(1) in the 2 factor.

On the other hand for any subspace one can bound P(X’ in W) from below by (15/16)^{n – dim W}, so together using Bayes’s rule,
P(X’ in W | X’ in V) P(X’ in V) = P(X’ in W),
one get the bound

P(X’ in W | X’ in V) ge 2 P(X in V) (16/15)^{n- dim W}

This is excellent because one can decompose the condition that a collection of vectors {X_1, ldots, X_k}  are linearly independent inductively as
X_1, ldots, X_{k-1} are linearly independent and X_k is outside the span of X_1, ldots, X_{k-1}. So that is the following probability
1 -P(X-k in Span{X_1, ldots, X_{k-1}} | X_k in V)

The reason we want to condition on V_k in V and look at probability of linear independence is the following:

The event that X_1, ldots, X_n span V can be thought of as the event that X_1, ldots, X_k are linearly independent together with n-k other vectors that span V. This is the standard "apparently useless decomposition of conditions on a collection of objects". Of course k will be chosen carefully, in particular not too small.

to be continued.