多努力分布 (multinoulli distribution)

Today a colleague of mine brought up an interesting mathematical statistics problem. What is the Jensen Shannon divergence between two multinomial distributions? In the discussion, he mentioned that he has reduced the problem to looking at binomial distributions. I misheard it as Bernoulli distribution, and started wondering what’s the name of the multinomial analogue of that. Surely it is not called multinomial distribution, since the latter deals with n objects, rather than 1.

Finally I read about Rademacher distribution, which is nothing but a rescaled version of Bernoulli distribution. Outraged by the excessive naming in mathematics, I started looking up the former curiosity.

According to wikipedia, the correct generalizing nomenclature is categorical distribution, or multinoulli distribution. I have never heard of multinoulli before, but the etymology is self-explanatory and Bernoulli seems respectable enough to coin a new word based on his name. The most natural Chinese translation becomes “how much effort?”. In fact, if one restricts to multinommial where n =1, then it’s the same as multinoulli distribution.

Back to the colleague’s problem: recall Jensen-Shannon is defined by
JSD(u, v) = \sum u_i \log 2u_i / (u_i + v_i) + v_i \log 2v_i / (u_i + v_i). For u = multinomial(n, \alpha) and $\latex v = multinomial(m, \beta)$, JSD doesn’t make sense unless n = m, which we assume. \alpha and \beta are vectors that sum to 1, and both of the same dimension k, otherwise again it doesn’t make sense.

There are n!/(n-k)! different points in the common state space. We can simply calculate the probability under u and v of each point. Taking the case of $k=2$, we are then dealing with the special case of binomial distribution. The calculation eventually reduces to a sum of the form
\sum_i \binom{n}{i} \alpha^i \log (1 + (\alpha / \beta)^i). Mathematica suggests that this is not reducible to closed form in terms of common special functions. I suspected that one could Taylor expand log(1 + \epsilon)  = \epsilon - \epsilon^2 / 2 + \epsilon^3 / 3 - \ldots and get item-wise closed form. It is true that each term in the resulting expansion is summable in closed form, but summing them together becomes just as difficult. In fact with each n, I end up with n terms in this roundabout summation in mathematica. So I am finally convinced that this problem has no analytic solution.

Advertisements

About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s