多努力分布 (multinoulli distribution)

Today a colleague of mine brought up an interesting mathematical statistics problem. What is the Jensen Shannon divergence between two multinomial distributions? In the discussion, he mentioned that he has reduced the problem to looking at binomial distributions. I misheard it as Bernoulli distribution, and started wondering what’s the name of the multinomial analogue of that. Surely it is not called multinomial distribution, since the latter deals with n objects, rather than 1.

Finally I read about Rademacher distribution, which is nothing but a rescaled version of Bernoulli distribution. Outraged by the excessive naming in mathematics, I started looking up the former curiosity.

According to wikipedia, the correct generalizing nomenclature is categorical distribution, or multinoulli distribution. I have never heard of multinoulli before, but the etymology is self-explanatory and Bernoulli seems respectable enough to coin a new word based on his name. The most natural Chinese translation becomes “how much effort?”. In fact, if one restricts to multinommial where n =1, then it’s the same as multinoulli distribution.

Back to the colleague’s problem: recall Jensen-Shannon is defined by
JSD(u, v) = \sum u_i \log 2u_i / (u_i + v_i) + v_i \log 2v_i / (u_i + v_i). For u = multinomial(n, \alpha) and $\latex v = multinomial(m, \beta)$, JSD doesn’t make sense unless n = m, which we assume. \alpha and \beta are vectors that sum to 1, and both of the same dimension k, otherwise again it doesn’t make sense.

There are n!/(n-k)! different points in the common state space. We can simply calculate the probability under u and v of each point. Taking the case of $k=2$, we are then dealing with the special case of binomial distribution. The calculation eventually reduces to a sum of the form
\sum_i \binom{n}{i} \alpha^i \log (1 + (\alpha / \beta)^i). Mathematica suggests that this is not reducible to closed form in terms of common special functions. I suspected that one could Taylor expand log(1 + \epsilon)  = \epsilon - \epsilon^2 / 2 + \epsilon^3 / 3 - \ldots and get item-wise closed form. It is true that each term in the resulting expansion is summable in closed form, but summing them together becomes just as difficult. In fact with each n, I end up with n terms in this roundabout summation in mathematica. So I am finally convinced that this problem has no analytic solution.

Insurance policy

Only after my second child was born did I realize that I have been using the less economic insurance plans all these years. Even during years without major medical events, my family members, myself included, make hospital visits pretty frequently. The ideal plan in that situation is EPO. But I have been using a highly subsidized version of PPO sponsored through my company for the past year, since it was the best advertised one and seems to essentially level the deductible with EPO. What I did not understand is that for things like child delivery, EPO charges a flat $250 rate as reported by the Chinese community, whereas the PPO plan accumulates a bill in excess of $20k, which still charges $2k to me after coinsurance. Unfortunately the details are in the fine-prints, and it’s not in the insurance company’s best interest to make them transparent. Lesson learned. I will have to trust Chinese source of information far more than the English ones, because the latter just suck with irrelevant details.

2.5 hour struggle with technology

I am not impressed with user-friendliness (or user-hostility) of either of the two major cellphone makers. Last night I had to port contact list from a Galaxy 5s to a newly bought iphone 7+ for my mother-in-law; for the record, I would never buy a luxury good like that for myself. Initially the solution seemed straightforward. The conventional means was to set up a google account (which my MIL hadn’t because of restriction in China), and export and import contacts there. It turned out that the galaxy device wouldn’t allow me to use gmail at all, possibly because it was configured for Chinese users, who have no legit use of google products. This in fact took me a while to discover and confirm, as I tried installing the gmail app from the built-in samsung store, which then prompted me to add an email account, only to be rejected because I don’t have google play store installed. The latter turned out to be unavailable in the samsung store, presumably because samsung didn’t want google to takes its market share of mobile apps. Even an unsophisticated user like me can easily sniff competitions going awry at the expense of users with these design choices.

The next option was to use the sim card as a physical medium of transfer. This again was a dead-end because when I tapped on the menu option on iphone 7+ that says import contact from sim card, I got instantly worm-holed back to the home screen without any explanation (or apology). Could this be a case of incompatibility with foreign sim card (previously used on a Chinese device)? I also tried switching the iphone locale to en/us, as iphones were notorious for incomplete feature implementation in secondary locales, but still had no luck. The complaint for localization bugs will be fodder for a later thread. After researching on the web about this turned up no relevant results, I was briefly flummoxed.

The saving grace was the realization that the iphone 7+ did carry a scanty few contacts from the old galaxy phone. Initially I thought it was due to an incomplete exportation to sim, but after switching the sim hosts several times, and consulting with my family members, I started looking at my MIL’s newly created gmail account (which is inaccessible on galaxy). Then it became clear that those few contacts came from an earlier porting attempt by my wife. So a third solution emerged: try loading the contact list directly into the gmail account, and then hopefully it will automatically sync with the iphone.

The next episode simply proves the adage that bad things all come at once. First it took me a while to figure out how to access the local file system on the android: there turned out to be an app just for that, fortunately already installed. It took me no time to locate the file storing the contact list. But how should I send it to other devices? Gmail is out of question. This left me with basically only one option: use wechat. In a moment of unequivocal stupidity, I logged out of my MIL’s wechat account and got into mine, and sent the file as an attachment to myself there. The goal was to retrieve the file on another mobile device/macbook so that it could eventually be uploaded to gmail. I then started checking my personal android phone for the sent file, but it was running soon out of battery. I connected it to the my mac air and made sure that the battery charging mode was on (indeed the data transfer mode was not supported any more by the itune version on my mac air, which was only 4 years old!). But the battery turned out to be really depleted at that point, despite the indicator showing 30% before shutting down. After a few failed attempt to reboot without instantly shutting down, I decided to plug it into a wall socket and simply wait. Meanwhile, I had the ingenious idea of sending the file to my wife’s android phone. It was no longer possible for me to log back into my MIL’s wechat account since she forgot her username and password, and my wife, being the only person knowledgeable in this matter, was upstairs breastfeeding or something and could not be disturbed. For about 10 minutes, I tried to use wechat on my mac air directly, only to find out that it required 2d bar code scanning from a mobile device, which was out of battery at the moment. Even though I eventually succeeded in this regard, the sent file was not showing up in any self-conversation tab, on either my phone or the laptop. So finally I forwarded the file to my wife’s phone, and it appeared instantly on her device’s end. Could that be a bug in wechat regarding self-conversation? Only John von Neumann knows. The rest was happy ending, though to be fair I could have spent that 2.5 hours babysitting my younger one or pretended to do some math in my head.

How much it costs to raise a kid

Today my wife made the comment that it is actually easier on the parents to send the kids to extracurricular classes than having them stay home, despite the extra financial cost. So I was curious enough to do the following back of the envelope calculation. Assuming that we send one kid out every working hour during the week, that is 40 hours a week, so for 18 years, assuming $50 an hour, this amounts to:
echo “40 * 52 * 18 * 50” | bc

that is a whopping 1.8 million dollars, something only the top 5% of this country can afford. And this is just one kid, and non-weekend working hours. With weekend nannies, diapers, and other material cost, even if we lower the hourly rate to $25, I think the figure is still easily exceeding $1 million. So how on earth can people in this country afford to have a kid, let alone multiple ones?

Doing research of any kind is insurmountably difficult

I have lived in the research world for a while now, more precisely 12 years. My journey has been an extremely inert one. There have been countless times when I thought I am onto something, and it turned out to be fluke, bug, or some other uninteresting outcome. While in academia, I at least had the leisure of choosing the problem I wanted to pursue, some of which might not be at the center of the community spotlight, hence could yield to persistent trying, in industry, the competition is laid out in plain sight, and the metrics against which success is measured are few. The competition not only comes from contemporary peers, but also historical knowledge accumulation, which is true in academic settings also. What is more frustrating is that one often gets committed into a no-brainer project, only to find out later that it is a hole from which one can never crawl out in a wholesome way. This is the key difference between academic pursuit and industrial pursuit. Although in the former case, one also has coauthor’s trust at stake sometimes.

In any event, I have presently been stuck in such a hole for the better part of 4 months. The goal is not even very lofty, but a mere refactoring and space saving gimmick that doesn’t even qualify as a new idea. It turned out however that all those savings come at a cost, namely metrics are going down, despite all kinds of variations I have tried. Being an honest person, I do not wish to resort to the mercy of the team to launch the project, however, it is also distasteful to let it go to waste, since another colleague has been with me throughout this “wonderful” journey and I have a responsibility for not letting him down. While many other folks are anxiously waiting for this bottleneck project to settle down, I continue to bang my head against the wall, especially given how slowly things move within our organization. This may be the most opportune time to fuss about work.

So then I thought about Abraham Lincoln, and how he overcame an insurmountable amount of personal and political difficulties, only to be shot dead in the end. But the beautiful part of his story is that he carried all such weight with a smile of grace. ‘Tis I shall emulate, and prod along with animalistic persistence despite ever dwindling peer respect for my intelligence, prospect for promotion, and the opportunity to change the world and shit before I succumb to natural decay. Eventually the organization will figure out the right place for me to grow or rot, and all I should care about is the next local optimum to pursue.

Reading, innovation, and meaning of life

While staying home on paternity leave, I had more time to ponder the meaning of life, away from hectic programming day job. This is coupled by my grandma’s accidental fall in the bathroom, and the less than optimistic prognosis that her rib bones were fractured and heart and lung got infected as a result. I think even God appeared to me one night to give consolation, since this is a justifiably depressing time, despite the smoothness of the newborn. In any case, reflecting on my first 9 months at my current job, one trap I repeatedly fell in was that deep down, I wanted to innovate and make big news so bad, that I lost sight of the lifelong pursuit of learning. As a programmer, there are many ways to absorb old and new technology. The whole industry evolves around making learning more accessible to both the insiders and outsiders. Maybe the abundance of resource pushed me into the other extreme, by completing shutting my brain off from learning and focusing on continuous philosophizing and hypothesis testing. This break allowed me to realize this as a critical vice that would hinder my long term productivity.

So having regained some intellectual energy, I revisited a branch of mathematics that I detested as a graduate student, namely analytic number theory, as partially motivated by Terence Tao’s most recent blog post on the Bombieri heuristic. Yesterday I managed to get a systematic education on the Mobius function. Today I started reading his earlier post on Goldston-Pintz-Yildirim, Motohashi-Pintz, and YT Zhang’s result. I got tripped by a seemingly innocent estimate, that the Hardy-Littlewood constant relevant for the prime constellation conjecture is bounded away from 0. It turned out to be an elementary consequence of the Prime Number Theorem, which I have always held in awe and dared not to apply it to real questions of interest. It sounded like going through the whole post would be both rewarding and challenging, but I have set my mind to do so, and hopefully come up with some followup learning items. After all, number theory is an exact science and a mediocre mind like mine should still be able to penetrate it, given enough volition. Hopefully I will then find some common ground with past grad school friends and borrow analytic ideas to solve my own problems in Lie theory and probability. Thank you God for the latest revelation.

