## A great probability inequality

Introductory probability textbooks emphasize the ubiquity of Cauchy-Schwarz inequality in proving other inequalities. Some related inequalities include Holder’s inequality, Young’s inequality, geometric mean arithmetic mean inequality, which ultimately all boil down to Jensen’s inequality. These inequalities form the foundation of analysis. On the other hand, concentration type of inequality tends to exploit the opposite directions. I am not an expert in the latter, despite my paper with W. Xu on a small application thereof. Instead I will expound on a recent problem-solving adventure that illustrates techniques for proving the opposite inequalities. The problem is communicated to me by Xin Zhou.

The inequality in question is $(E|X^2 - E(X^2)|)^2 \le 4 EX^2 var X$. One is tempted to try convexity type of argument initially, but is bound to fail. Instead we want to exploit the extreme points in the space of random variables, i.e., find the worst case $X$ that still satisfies the inequality. First we perform some simplifying reduction: X can be assumed nonnegative, since if we replace X by $|X|$, then left hand side is unchanged whereas the right hand side is non-increasing. Next due to the same scaling properties of both hand sides, we can assume $E X^2 = 1$. Therefore we need to prove $(E|X^2 - 1|)^2 \le 4 var X$, for $X \ge 0$ and $X = |X |$. Clearly the scaling reduction simplifies things a lot.

Now we write $var X = E X^2 - \mu^2 = 1-\mu^2$, where $\mu = E X \in (0,1]$. Thus if we condition on $\mu$, the right hand side is completely deterministic. Now let $Y_> = Y 1_{Y > 1}$ and $Y_\le = Y 1_{Y \le 1}$, so that $Y = Y_> + Y_\le$. Now we want to show that the worst case scenario $X$ satisfies $p(X \in (0,1)) = 0$. That is, $X$ is either $0$ or bigger than or equal to $1$. We show this by showing that any nonnegative variable $X$ can be reduced to such a form, preserving first and second moments, and in such a way that $E |X^2 - 1|$ is non-decreasing.  The obvious thing to try is first to multiply $X$ by the indicator $1_{X \ge 1}$. Then we need to compensate for the shrinkage of first and second moments by modifying the part of $X$ that’s $\ge 1$. If we attempt a naive constant value $h$, then we run out of degrees of freedom when trying to match both first and second moments, since the probability $p = p(X \ge 1)$ is pre-fixed. Therefore we need to augment the probability space if necessary and make the part of $X$ that’s $\ge 1$ two-piece constant, that is, $X \in \{h_1,h_2\}$ with probabilities $p_1, p_2$ respectively, where $h_1, h_2 \ge 1$, and $p_1 + p_2 = p$. This is guaranteed to work, since $E X^2 \ge E X$ by Jensen. The augmentation of probability space to accommodate the two-piece function might seem like ugly hack, but I don’t see another easy way around as of today.

Lastly why does this make $E | X^2 - 1|$ non-decreasing? Essentially we are looking at $E|Y - 1|$ given $EY =1$ and $Y \ge 0$. Clearly by “pushing” the part of $Y$ between $0$ and $1$ to $0$, and add the scrapped mass to the part above or equal $1$, we preserve 2nd moment and increase $E|X^2 -1|$. And the augmentation hack from previous paragraph essentially says by adding the mass tactfully, we can preserve first moment also. Having established a canonically worst case form random variable, it’s easy to check that they satisfy our objective inequality. Hence we are done.