Today a colleague of mine brought up an interesting mathematical statistics problem. What is the Jensen Shannon divergence between two multinomial distributions? In the discussion, he mentioned that he has reduced the problem to looking at binomial distributions. I misheard it as Bernoulli distribution, and started wondering what’s the name of the multinomial analogue of that. Surely it is not called multinomial distribution, since the latter deals with objects, rather than 1.
Finally I read about Rademacher distribution, which is nothing but a rescaled version of Bernoulli distribution. Outraged by the excessive naming in mathematics, I started looking up the former curiosity.
According to wikipedia, the correct generalizing nomenclature is categorical distribution, or multinoulli distribution. I have never heard of multinoulli before, but the etymology is self-explanatory and Bernoulli seems respectable enough to coin a new word based on his name. The most natural Chinese translation becomes “how much effort?”. In fact, if one restricts to multinommial where , then it’s the same as multinoulli distribution.
Back to the colleague’s problem: recall Jensen-Shannon is defined by
. For and $\latex v = multinomial(m, \beta)$, JSD doesn’t make sense unless , which we assume. and are vectors that sum to , and both of the same dimension , otherwise again it doesn’t make sense.
There are different points in the common state space. We can simply calculate the probability under and of each point. Taking the case of $k=2$, we are then dealing with the special case of binomial distribution. The calculation eventually reduces to a sum of the form
. Mathematica suggests that this is not reducible to closed form in terms of common special functions. I suspected that one could Taylor expand and get item-wise closed form. It is true that each term in the resulting expansion is summable in closed form, but summing them together becomes just as difficult. In fact with each , I end up with terms in this roundabout summation in mathematica. So I am finally convinced that this problem has no analytic solution.