In any case, I was happy with most of the arguments until the proof of Theorem 2. I am not questioning the correctness of its statement since I haven’t had time to try to find a counterexample or come up with a proof independently. The assertion, “By construction, no variable can be removed from without destroying this property”, however, does seem wrong to me. Here is a simple counterexample:

Take , , , and , where implicit product means and () and sign means or (). Furthermore let and . Then is clearly an -clause but not a prime one, since alone is an -clause.

Now I am not sure if this step is critical to the rest of the proof of Theorem 2, but I am suitably discouraged at this point to put in more salvaging time. This also highlights another complaint of mine, which is the unnecessarily complicated notation , which I feel can be replaced by something more standard and light-handed.

]]>

- Bayes point algorithm: I first heard about this through a colleague’s work, where the main selling point seems to be the ability to control the model directly through the example weight, rather than feature weight. The definite reference seems to be this 34 page paper, which is clear on a sober afternoon, but can be quite daunting if one simply wants to scan through it to pick up the main idea. I haven’t finished it, but the first 5 pages look quite sensible and promise good return on time spent. My current speculative understanding is that this is a mixture of ideas from support vector machines, which focus on frequentist style optimization problem, and Bayesian inference, which is more expensive but has the nice “anytime” property, meaning even a non-converged model is useful. One thing I found funny was that the authors talked about Cox’s theorem on objective probability; not sure if it is really necessary in a technical paper like this, but authors are certainly allowed to digress a little philosophically.
- Bayesian linear regression: I learned this mainly through the namesake wikipedia article. Don’t look at the multivariate version, since it distracts you from the main point. The idea is to have a prior of some distribution for the weights (i.e., linear regressors), which can be conveniently chosen to be Gaussian (a conjugate prior). Then the posterior will have some Gaussian distribution whose mean and variance depend on the data as well as the prior. The formula presented towards the end indeed shows that if the prior is highly concentrated near its mean (high confidence), then the posterior distribution will lean towards the prior mean.
- Gauss Newton method: I thought I knew Newton’s method since grade school. Turns out even a veteran like me can get abysmally confused about the distinction between solving an equation and optimizing an objective function. So I spent a few minutes wondering why Newton’s method is considered a second order method, though to be fair, the label of second order has nothing to do with the use of 2nd derivatives, but mainly with the quadratic convergence rate. To those equally uninitiated readers, to a (vector-valued) equation of the form will typically have isolated solutions only when and are of the same dimension (assumed both to be in some Euclidean space). Otherwise you get a sub-variety as your solution, which numerical analysts and engineers typically don’t care for. Similarly optimization only makes sense when the objective function is real-valued, rather than vector-valued. By taking the gradient, one converts the later type of problem to the former. In any rate, I started looking up Gauss Newton, which isn’t exactly Newton Ralphson, after seeing this line in the implementation note of the trf library, which by the way, is extremely well written and makes me understand trust region reflective algorithm in one sitting. In it, the author mentions that one can approximate the Hessian matrix of a nonlinear objective function with . This looked vaguely similar to the typical linear regression exact solution, and in fact is related. As long as the objective function is a sum of squares of individual components, the GN algorithm works, but approximating Hessian with first order derivatives. This obviously can speed up things a lot.
- fmin_slsqp function in scipy: I found out about this mainly through this blog post. Upon looking at the implementation on github, I got a bit dismayed since a heavy portion of the code is done using python while and for loops. But I keep telling myself not to prematurely optimize, so maybe this will be my first library of choice. The underlying Sequential Least SQuares Programming approach looks somewhat quadratic.
- least_squares implementation in scipy: This one looks more promising in terms of performance, in particular it uses the trf library mentioned below, which promises to be appropriate for large scale problems.
- trust region reflective algorithm in scipy: the implementation note for trf above is quite good. The essential idea seems to be to treat a constrained optimization problem locally as a quadratic programming problem and bake the inequality constraints into the objective. Then reshape the region of optimization by the inverse of the diagonal matrix consisting of the distance to the boundaries in each direction (presumably the region is always convex so that this makes sense).
- Cauchy point: the solution of the gradient descent multiplier to maximize descent under the constraint that the independent variable doesn’t move beyond a certain radius. This seems related to Wolfe condition, but the latter guarantees convergence of gradient norm to 0, whereas Cauchy is just a way to cheaply optimize a single gradient descent step under constraint.

]]>

可惜阿姨实在太能干，把青椒事先给我切好了，所以也已成功一半了。外加还给我准备了豆腐干丝。接下来我只要管切肉。平时我对切肉也相当没有技巧。最后往往切成肉丁和肉块并存。主要原因是常温下肉太软，加之到不够快，所以很难切出好的形状。这次多亏阿姨事先给我拿出冷冻的肉，就可以像锯木头一样工整地切。下锅之前，还要把肉放在碗里用稍许料酒，白胡椒，生抽，以及足量的生粉。中餐中生粉的重要性跟油差不多。没了它肉绝对炒不嫩。白胡椒是第一次听说，但是肯定不会难吃。

根据知乎上文，肉炒的时候要热锅冷油。而且建议油放到1/4锅的容量。这有点夸张。所以我只是比平时多加了点油，按照我的健康标准已经过多了。可惜今天锅是热了，油却预热时间过长。结果肉一下锅就劈哩叭啦，没有做到知乎文中的不吸油的状态。不过油放的足够多，最后捞上来也能继续炒别的。文中还说一上来不要去搅肉，但因为我油放的浅了，根本盖不住肉，所以不得不很早就开始搅，当然也是比较轻微的。可惜很晚才意识到可以把大火调到最小，所以有几根肉丝炒的比较煳。最后阿姨过来直接命令出锅，即便有几根肉丝还微带血丝。

把肉直乘到最终装菜的盘子里。花了一点时间把肉和剩下的油分离。接下来炒豆干和肉丝就比较容易了。具体是将锅里剩下的油烧热，然后先将豆干丝倒下去翻炒，然后过大概20秒再放青椒丝。不久之后再放一勺盐，和稍许红糖。后者是为了提高鲜味，因为家里不用耗油等吃了会口干的调料。最后还有一个关键步骤就是尝咸淡。之前放盐总觉得一小勺太多。这次看了知乎文章才知道都是半勺半勺加的，所以直接放了一勺。看来咸淡正合适。

这是炒完了的样子：

这是开吃半分钟以后的样子：

]]>

My voice tends to lack the resounding quality of a leader, or even a domain expert. I attribute this not to my physical inadequacy, but a general lack of confidence. When I utter a sentence, it usually has not been completely thought out. Even if it has, my mind can vacillate mid-air. Throughout my higher education I have over-emphasized depth and originality of ideas and neglected presentation. It takes considerable deliberation to present a piece of information in a socially convincing manner, no matter how trivial it is. Indeed, great speakers tend to over-sell mundane ideas, over and over, without boring or embarrassing the audience. I might have missed a critical lesson for not going through the brutality of academic job search, which requires an inordinate amount of salesmanship. So as a stage II corporate parasite, I must voluntarily allocate quality time to re-establish my character independence.

]]>

Finally I read about Rademacher distribution, which is nothing but a rescaled version of Bernoulli distribution. Outraged by the excessive naming in mathematics, I started looking up the former curiosity.

According to wikipedia, the correct generalizing nomenclature is categorical distribution, or multinoulli distribution. I have never heard of multinoulli before, but the etymology is self-explanatory and Bernoulli seems respectable enough to coin a new word based on his name. The most natural Chinese translation becomes “how much effort?”. In fact, if one restricts to multinommial where , then it’s the same as multinoulli distribution.

Back to the colleague’s problem: recall Jensen-Shannon is defined by

. For and $\latex v = multinomial(m, \beta)$, JSD doesn’t make sense unless , which we assume. and are vectors that sum to , and both of the same dimension , otherwise again it doesn’t make sense.

There are different points in the common state space. We can simply calculate the probability under and of each point. Taking the case of $k=2$, we are then dealing with the special case of binomial distribution. The calculation eventually reduces to a sum of the form

. Mathematica suggests that this is not reducible to closed form in terms of common special functions. I suspected that one could Taylor expand and get item-wise closed form. It is true that each term in the resulting expansion is summable in closed form, but summing them together becomes just as difficult. In fact with each , I end up with terms in this roundabout summation in mathematica. So I am finally convinced that this problem has no analytic solution.

]]>

去了一趟云南也看了中医。途中微信感慨了国内医生的敬业，却遭美国朋友嘲笑说中国人不重视锻炼。或许锻炼是比欧美人少，但是也是出于无奈。试想天天单程1.5小时的上班族，外加孩子，有时还要加班，哪有时间去锻炼？吃了几幅药也没见好转。看来锻炼是硬道理。我虽然在微信上严辞反驳说美国80%的肥胖病患率，但回到上海后也开始锻炼了。

重拾高中时的篮球，在校园内95后学弟们的寝室边上篮球场独自练投篮，虽然有老大徒悲伤之感，但球技也颇有长进。对健康却没有信心。如今受感冒咳嗽困扰，胃病似乎缓一缓了，但一些生理表征如隔膜横纹，过量饮食后导致体力衰竭的症状依旧如前，只是消化不良似乎没那么明显了。或许生理年龄到了换一种病的时候了。至于咳嗽，仍然受冷空气和疲劳诱发，似乎略有好转，但很有可能是短期受高强度锻炼兴奋所致。就跟两年前暑假骑自行车上下班是一样。体质虚弱没有好转。

打室外篮球另一好处乃阳光浴。平时我早上9点到10点半左右，可以足足晒1个半小时。国内阳光又没那么刺激，估计皮肤癌概率减半也是有的。今天在网上发现一些偏方也称晒背大有功效。据说长寿村也是晒出来的。这个观点在西医网站也看到过，说是阳光增强抵抗力，促进钙质吸收。之前在美国也略有小试，效果不甚佳，主要还是太在乎晒黑，皮肤癌之类的危害，和对加州阳光的恐惧。此番回去或许会坚持用防晒霜，并注意控制日晒时间，顺便带娃，可以缓解家庭矛盾又健身了。至于运动，还是需要适量。毕竟带娃已经有很大体力支出。最好还是乘身体需要的是有适当锻炼以下。比如一周一到两次球赛。室内运动尽量避免。等身体彻底恢复再考虑肌肉什么的。

如今回美最大的困惑还是如何面对家人在体力方面的要求。孩子9点前不肯去学校，也是逼不出来的。关键下午下班得早，可以乘机带他去公园玩，顺便晒太阳。或直接回家让阿姨接管。好过在路上堵车之苦。太早太晚都不好。关键要走的时间巧。早晨作息还得从长计议。

]]>

]]>

The next option was to use the sim card as a physical medium of transfer. This again was a dead-end because when I tapped on the menu option on iphone 7+ that says import contact from sim card, I got instantly worm-holed back to the home screen without any explanation (or apology). Could this be a case of incompatibility with foreign sim card (previously used on a Chinese device)? I also tried switching the iphone locale to en/us, as iphones were notorious for incomplete feature implementation in secondary locales, but still had no luck. The complaint for localization bugs will be fodder for a later thread. After researching on the web about this turned up no relevant results, I was briefly flummoxed.

The saving grace was the realization that the iphone 7+ did carry a scanty few contacts from the old galaxy phone. Initially I thought it was due to an incomplete exportation to sim, but after switching the sim hosts several times, and consulting with my family members, I started looking at my MIL’s newly created gmail account (which is inaccessible on galaxy). Then it became clear that those few contacts came from an earlier porting attempt by my wife. So a third solution emerged: try loading the contact list directly into the gmail account, and then hopefully it will automatically sync with the iphone.

The next episode simply proves the adage that bad things all come at once. First it took me a while to figure out how to access the local file system on the android: there turned out to be an app just for that, fortunately already installed. It took me no time to locate the file storing the contact list. But how should I send it to other devices? Gmail is out of question. This left me with basically only one option: use wechat. In a moment of unequivocal stupidity, I logged out of my MIL’s wechat account and got into mine, and sent the file as an attachment to myself there. The goal was to retrieve the file on another mobile device/macbook so that it could eventually be uploaded to gmail. I then started checking my personal android phone for the sent file, but it was running soon out of battery. I connected it to the my mac air and made sure that the battery charging mode was on (indeed the data transfer mode was not supported any more by the itune version on my mac air, which was only 4 years old!). But the battery turned out to be really depleted at that point, despite the indicator showing 30% before shutting down. After a few failed attempt to reboot without instantly shutting down, I decided to plug it into a wall socket and simply wait. Meanwhile, I had the ingenious idea of sending the file to my wife’s android phone. It was no longer possible for me to log back into my MIL’s wechat account since she forgot her username and password, and my wife, being the only person knowledgeable in this matter, was upstairs breastfeeding or something and could not be disturbed. For about 10 minutes, I tried to use wechat on my mac air directly, only to find out that it required 2d bar code scanning from a mobile device, which was out of battery at the moment. Even though I eventually succeeded in this regard, the sent file was not showing up in any self-conversation tab, on either my phone or the laptop. So finally I forwarded the file to my wife’s phone, and it appeared instantly on her device’s end. Could that be a bug in wechat regarding self-conversation? Only John von Neumann knows. The rest was happy ending, though to be fair I could have spent that 2.5 hours babysitting my younger one or pretended to do some math in my head.

]]>

echo “40 * 52 * 18 * 50” | bc

1872000

that is a whopping 1.8 million dollars, something only the top 5% of this country can afford. And this is just one kid, and non-weekend working hours. With weekend nannies, diapers, and other material cost, even if we lower the hourly rate to $25, I think the figure is still easily exceeding $1 million. So how on earth can people in this country afford to have a kid, let alone multiple ones?

]]>

In any event, I have presently been stuck in such a hole for the better part of 4 months. The goal is not even very lofty, but a mere refactoring and space saving gimmick that doesn’t even qualify as a new idea. It turned out however that all those savings come at a cost, namely metrics are going down, despite all kinds of variations I have tried. Being an honest person, I do not wish to resort to the mercy of the team to launch the project, however, it is also distasteful to let it go to waste, since another colleague has been with me throughout this “wonderful” journey and I have a responsibility for not letting him down. While many other folks are anxiously waiting for this bottleneck project to settle down, I continue to bang my head against the wall, especially given how slowly things move within our organization. This may be the most opportune time to fuss about work.

So then I thought about Abraham Lincoln, and how he overcame an insurmountable amount of personal and political difficulties, only to be shot dead in the end. But the beautiful part of his story is that he carried all such weight with a smile of grace. ‘Tis I shall emulate, and prod along with animalistic persistence despite ever dwindling peer respect for my intelligence, prospect for promotion, and the opportunity to change the world and shit before I succumb to natural decay. Eventually the organization will figure out the right place for me to grow or rot, and all I should care about is the next local optimum to pursue.

]]>