In this section we re-present Bolthausen’s original proof of Berry Esseen using Stein’s method and fill in some computational details. I hope to make the argument more transparent to people trying to learn Stein’s method and Berry-Esseen theorem. The letter will denote a constant that varies from line to line, but will be a fixed absolute constant.

The classical Berry-Esseen theorems gives bound between the normal distribution function and the cumulative distribution function of , where are iid with mean 0 variance 1 and . It states

where is universal. Various efforts have been spent on finding the optimal , which is certainly less than . But we are only interested in the existence of such a .

First we can restate (1)as follows. Define

where

- is the collection of all length sequences of iid variables with prescribed first three absolute moments, ,,and respectively.
- is the indicator that , and
- , where is the standard normal density function, is the same as .

As a check of your understanding, the above supremum is bounded by in (1).

Next we want to get a recursive bound on etc. This will be achieved by manipulating the additive structure . In fact if we can prove

for some absolute constant , then we have

where we can choose a bigger on the right hand side, because the geometric sum can be bounded by , for some (the geometric decay dominates the growth). To bound , we will introduce some continuous approximations of the heavyside function . We define

where means taking minimum of the two sides, and maximum. In words, is the same as for and linearly interpolates and in between. We also denote , as the natural functional limit.

We also define the direct analogue of in the -smoothed case,

From the fact that (try drawing their graphs using the linear interpolation interpretation), one gets

This follows because if for a particular sequence and , , we get

The last bound uses the fact that the normal density is bounded uniformly by , and that is supported on the interval and bounded by . The case is treated similarly. Hence taking the supremum over we get (8).

Now we can make further abstraction of (3) by defining , then we can rewrite

This might seem tautological so far, but here is where Stein’s method comes in, because has the following nice represetation:

where . This is just elementary differentiation. Recall Stein’s method says that has standard normal distribution if and only if for every smooth ,

Thus we are reducing the problem of bounding difference of distribution functions to that of bounding the associated Stein operator. From now on we will write for , and when , we will write out explicitly.

Since ,

Thus for , . For , we have , and . Therefore

for . Together we have (15) for . Therefore (Bolthausen claims , but it doesn’t seem obvious and we don’t need it). Similarly, using (14), we get

which combined with (11) gives . By symmetry, the above inequalities hold for as well. So in summary we have , , and are all universally bounded.

Also from the definition , we get by a simple change of variable

Note could stand for if .

Using Stein’s equation again we get

Now recall we need to bound . By Stein’s equation, this reduces to

by linearity of expectation and exchangeability of ‘s; the form of is chosen merely as a convenience. The same argument below works for non-iid sequence as well.

We can write as

where (note the difference from ). Therefore (19)becomes

The next comes the crucial steps in establishing the recursive bound (4). We will bound and the later integral separately:

- Using (18),
We use two comparisons with the Gaussian distribution to get, for any constant ,

Here we again use the uniform bound of the normal density. Observe that the final bound is in terms of instead of . This is due to (17)and nicely relates ‘s to ‘s. Also note that

for .

Thus by conditioning first on , we get - Now we bound the second term in (19) similarly, using (18)again:
This is the only place where is used.

So in sum, we have proved that

because by Holder’s inequality. Recalling (10), and the relation between and , we can take the supremum over all for a fixed , and , to get

Finally using the bound (8)between and , we can relate with :

It remains then to optimize over . Choosing , we arrive at (4), which concludes the proof of Berry-Esseen’s theorem.

Though most of the steps are very detailed (thanks a lot for this, the proof in Bolthausens’s paper is quite hard to grasp at a first glance), I can’t see why eq (5) should be true. We need

sum(2^(-k)/sqrt(n-k), k, 0, n-1) infty, but where does the bound on the right-hand side come from? When I computed the expression for several n, I found that the bound seems to be quite sharp. Can you give me a hint on why it is true?

Dear Michael,

Consider first the sequence This converges to 2. Now consider . I can rewrite the latter as . The geometric series dominates the growth of , for instance, you can write . So we know that the latter series is summable because . Thus for any on the left hand side of eq (5) I can choose a bigger on the right hand side.