Gurvits’ inequality and proof of Van der Waerden’s conjecture

Today I went to Don Knuth’s talk on some recent spectacular progress made in the problem of computing permanents. The talk is motivated and mainly based on a recent article in the American Math Monthly called “On Leonid Gurvits’s proof for permanents”. The permanent is a variant of determinant, where in the Cayley expansion all terms take sign {+1}, i.e., for {A = (a_{ij})}, {\rm{Per} A = \sum_{\sigma \in S_n} \prod_{i=1}^n a_{i, \sigma{i}}}. It is known to be a Number P complete problem; a Number P (or Sharp P {\#}-P) problem is the counting problem of some nondeterministic polynomial time decision problem, i.e., the task is to count the number of paths the nondeterministic Turing machine could take to get to the decision. To be complete means that it’s weakly harder than all the Number P problem and is itself a Number P problem (much as the definition of NP complete). Therefore it is nontrivial to get a lower bound of permanents of even a restricted class of matrices, namely the doubly stochastic ones (row and column sums all equal 1, and entries are all nonnegative). The following lower bound was conjectured by Van der Waerden in 1928, and consequently proved in 1979 by Falikman and Egorychev, using some nontrivial theorem:

\displaystyle   \rm{Per} A \ge \frac{n!}{n^n} \ \ \ \ \ (22)

where {A} is an {n \times n} doubly stochastic matrix. The lower bound is achieved by the all 1 matrix divided by {n}. Some important consequences of Van der Waerden’s conjecture includes counting the number of perfect matchings of {d}-regular bipartite graphs of equal parts, because when we divide the bi-adjacency matrix (the top left quarter block of the full adjacency matrix) of such a bipartite graph by {d} we get a doubly stochastic one. The proof of (22) by Gurvitz is elementary and short. It starts with the following alternative characterization of the permanent, namely, if we define a {n}-variate homogeneous polynomial based on {A},

\displaystyle  p_A(z_1, \ldots, z_n) = \prod_{i=1}^n \sum_{j=1}^n a_{ij} x_j, \ \ \ \ \ (23)


\displaystyle  \rm{per} A = \frac{\partial^n p_A}{\partial z_1 \ldots \partial z_n} |_{z_i \equiv 0} \ \ \ \ \ (24)

because the partial derivative picks out all the terms in {p_A} that have no repeating {z_i}‘s, which must be the ones arising from a permutation of {n} (as opposed to a general function {[n] \rightarrow [n]}). To use induction, one defines the partial partial-derivatives:

\displaystyle  q_i(z_1, \ldots, z_i) = \frac{\partial^{n-i} p_A}{\partial x_{i+1} \ldots \partial x_{n}} |_{x_{i+1} = \ldots = x_n = 0}, \ \ \ \ \ (25)

which interpolate between {q_n = p_A(z_1, \ldots, z_n)} and {q_0 = \rm{per} A}. Next we define the so-called capacity of a real-valued polynomial, which link the {q_i} together quantitatively,

\displaystyle  \rm{cap}(p) = \inf_{x \in {\mathbb R}_+^n: \prod x_i =1} p(x) \ \ \ \ \ (26)

notice the range of infimum is over positive real vectors. We denote {C_n:= \{ x \in {\mathbb R}_+^n: \prod x_i =1\}}. By arithmetic-geometric mean inequality, for doubly stochastic {A}, {\rm{cap}(p_A) \ge 1}. But {\rm{cap}(p_A)(1, \ldots, 1) = 1}, we get {\rm{cap}(p_A) =1}. Also observe that since {q_0 = \rm{per} A} is constant, its capacity is itself. Van der Waerden’s conjecture would be proved by the following slightly more technical inductive inequalities relating the capacity of {q_i} with that of {q_{i-1}}:

\displaystyle   \rm{cap}(q_{i-1}) \ge \rm{cap}(q_i) g(\min\{i, \lambda_A(i)\}), \ \ \ \ \ (27)

where {\lambda_A(i)} is the number of nonzero entries of the ith column in {A}, and {g(k) = (1-\frac{1}{k})^{k-1}}, for {k \ge 1}, and {g(0) = 1}. Note that {g} is decreasing, so {g(\min\{i, \lambda_A(i)\}) \ge g(i) = \frac{(i-1)^{i-1}}{i^{i-1}}}. And since by telescoping

\displaystyle  \prod_{i=1}^n g(i) = \frac{0^0}{1^0} \frac{1^1}{2^1} \frac{2^2}{3^2} \ldots \frac{(n-1)^{n-1}}{n^{n-1}} = \frac{(n-1)!}{(n-1)^n} = \frac{n!}{n^n} \ \ \ \ \ (28)

(22) follows. Also we define a prime operation, which takes the derivative of the last variable and set it equal to {0}: for {p(x_1, \ldots, x_n)} a polynomial,

\displaystyle  p'(x_1, \ldots, x_{n-1}) = \frac{\partial p}{\partial x_n} |_{x_n =0}. \ \ \ \ \ (29)

Clearly {q_{i-1} = q_i'}. In fact the capacity induction bound (27) follows in turn from the more general

Theorem 7 Let {p(x_1,\ldots, x_n)} be 1.homogeneous 2. H-stable, and 3. with coefficients all in {{\mathbb R}_+}, where H-stable (H for half-plane) means it doesn’t vanish on {\mathbb{C}_{++}^n}. Then

  1. \displaystyle   \rm{cap}(p') \ge \rm{cap}(p) g(\deg_{x_n} (p)), \ \ \ \ \ (30)

  2. {p'(x_1, \ldots, x_{n-1}) = \frac{\partial p}{\partial x_n}(x_1, \ldots, x_{n-1}, 0)} is also H-stable (and of course homogeneous and with positive coefficients too).

We now point out the only crucial point in the proof where H-stable is needed; the rest of the proof of Van der Waerden’s conjecture can be restricted to real function theory. Basically we need to show that

for all {y \in {\mathbb R}_+^{n-1}}, all roots of {g_y(t) = p(y,t)} are negative.

Homogeneous polynomials of this property seem best characterized by H-stability. The proof is to show that the roots must be positive real linear combinations of {-y_1, \ldots, -y_{n-1}}, by a separating hyperline argument. This is best carried out in {{\mathbb C}} rather than {{\mathbb R}}, with the help of fundamental theorem of algebra. Notice the degree of {x_i} in {q_i} is is at most {\min\{i, \lambda_A(i)\}}, because taking {n-i} derivatives {\frac{\partial}{\partial x_{i+1} \ldots \partial x_n}} reduces the total polynomial degree by at least {n-i} and furthermore if the number of nonzero entries in the {j}th column is {k} say, then each term in {q_i} can only have {x_i} power at most {k}. {q_n = p_A} is clearly homogeneous, H-stable, and with positive coefficients. The above theorem then says that we can do induction from {q_n} to {q_0} because all three properties are preserved, and (27) follows immediately. The following lemma is not really necessary for Van der Waerden, but is for Gurvits’ inequality in the complex setting.

Lemma 8 For {p} H-stable and homogeneous (all terms have the same total power),

\displaystyle  |p(x)| \ge |p(\Re x)|. \ \ \ \ \ (31)

Proof: Using homogeneity, we have for some {b_i},

\displaystyle  p(x+ s\Re x) = p(\Re x) \prod_{i=1}^n (s-b_i) \ \ \ \ \ (32)

because expanding left-hand side we see the coefficient of {s^n} is the same as {p(\Re x)} and both sides are clearly polynomials in {s}. By H-stability, we know that { (1+ b_i) \Re x + \Im x = x + b_i \Re x \notin {\mathbb C}_{++}^n}. Hence {\Re b_i \ge -1}, because {x \in {\mathbb C}_{++}^n}. This implies {|-b_i| \ge 1} by Pythagorean. Thus

\displaystyle  |p(x)| = |p(x + s \Re x)|_{s=0} = |p(\Re x)| \prod_{i=1}^n |-b_i| \ge |p(\Re x)| \ \ \ \ \ (33)


A key observation to control capacities is:

\displaystyle   \rm{cap}(p) \le \frac{p(y, t)}{t} \ \ \ \ \ (34)

where {(y,t) \in {\mathbb R}_+^n}, with {\prod_{i=1}^{n-1} y_i = 1}. This follows by homogeneity of {p} and considering {x = t^{-1/n} (y, t)}. It serves to transition from the set {C_n:=\{y \in {\mathbb R}^n_+: \prod_{i=1}^n y_i = 1\}} to {C_{n-1}:=\{y \in {\mathbb R}^{n-1}_+: \prod_{i=1}^{n-1} y_i = 1\}}. The strategy is then to show for each {y},

\displaystyle   \lim_{i \rightarrow \infty} \frac{p( y, t_i)}{t_i} \le p'(y) g(\deg_{x_n}(p))^{-1} \ \ \ \ \ (35)

for a sequence of {t_i \neq 0}, because then we have

\displaystyle  \rm{cap}(p) \le \lim_{i \rightarrow \infty} \frac{p(y,t_i)}{t_i} \text{ for every } y\\ \le p'(y) g(\deg_{x_n}(p))^{-1} \text{ for every } y\\ \le \rm{cap}(p') g(\deg_{x_n}(p)) \text{ Gurvits' inequality}. \ \ \ \ \ (36)

Now for each {y}, we differentiate into two cases: I. {p(y,0) = 0} and II. {p(y,0) > 0}. I. Note that {p = q_i} always has all positive coefficients, this does not happen unless {n =1}. But In this case, we can write {p'(y) = \lim_{t \rightarrow 0^+} \frac{p(y,t)}{t}}. Using the fact {\deg_{x_n}(p) = 0}, and {g(0) =1}, (35) is verified. II. Let {k:= \deg_{x_n}(p)}. We further subdivide into two cases: 1. {k =0} and 2. {k \ge 1}. 1. Again the first case doesn’t happen for {q_i}‘s unless {\deg(p) = 0} ({n=0}) itself, which is vacuous. 2. We have {p'(y) \neq 0} for all {y \in C_{n-1}} (see notation above). Let {t := \frac{ k}{(k-1) p'(y)}}. we can also scale {p} so that {p(y,0) =1} (because we are in case II). Then since {1=p(y,0)} is the {t^0} coefficient of {g(t)}, we have

\displaystyle  p(y,t) = \prod_{i=1}^k (1+ a_i t) \ \ \ \ \ (37)

for some {a_i}, which are all positive because of 5. So since {t > 0} as well, we can use Jensen’s inequality (or GIAI) to get

\displaystyle  p(y,t) = \exp(\sum \log (1+ a_i t)) \le [\exp( \sum \frac{1}{k} \log (1+ a_i t))]^k \\ \le [\frac{1}{k} \sum_i (1 + a_i t)]^k = (\frac{k}{k-1})^k \ \ \ \ \ (38)

by the choice of {t} above. Since {t} contains a factor of {p'(y)}, we can relate {\rm{cap}(y)} and {p'(y)} finally below, for the sake of (35):

\displaystyle  \rm{cap}(p) \le \frac{p(y,t)}{t} \le p'(y) (\frac{k}{k-1})^{k-1}. \ \ \ \ \ (39)

The last thing one has to observe is that all the factors {(\frac{k}{k-1})^{k-1}} multiply up to {\frac{n^n}{n!}}, to complete the proof of Van der Waerden’s conjecture.


About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

4 Responses to Gurvits’ inequality and proof of Van der Waerden’s conjecture

  1. Leonid Gurvits says:

    Dear Colleague, great post! In general, your blog is very useful and full
    of a lot of accessible knowledge. It would be great if my approach
    can also work for another things in the realm of algebraic combinatorics.
    Actually, I think that the main so far is for the Mixed Volume.

    Best regards and thanks for your service to (some parts) of humanity,

    • Dear Leonid,
      I am glad that the original author of the proof found my exposition useful, which was the end result of three information relays. Knuth’s talk was indeed advertised by Persi Diaconis in his algebraic combinatorics class. My blogs on algebraic combinatorics are largely based on his lectures (I have been rather slow in updating the blog tags), which in turns was based on Stanley’s notes. I must admit I haven’t worked with permanent at all in the past. Please keep me informed if you find any connections with say the volume of the birkhoff polytope, etc.


  2. Horowitz says:

    Dear Yunjiang, I think there was a paper by A. Barvinok which used
    van der waerden ineq. for estimates of the volume of the birkhoff polytope.
    Look, as you can see in the proof: the permanent is just a hot application.
    The main problem is to estimate/bound some coefficients of some multivariate polynomials
    or even entire functions:

    Best, Leonid.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s