On Norbert Blum’s latest proof that P != NP

I first read about this on hacker news, which I encourage more people to subscribe for quality content in the tech/science space. Many people have raised doubts (including Scott Aaronson whose Democritus blog I used to follow) about the validity of the proof, but few seem to be able to point out any flaw to date. I found it generally well-written in the beginning, except some of the preliminary results seem too basic, such as the conversion between CNF and DNF, such as Lemma 1 and Lemma 2. The writing style reminds me of my REU project during summer 2007, when I wrote my first original research papers: the balance between non-triviality and clarity is often a newbie’s struggle.

In any case, I was happy with most of the arguments until the proof of Theorem 2. I am not questioning the correctness of its statement since I haven’t had time to try to find a counterexample or come up with a proof independently. The assertion, “By construction, no variable can be removed from c_\ell without destroying this property”, however, does seem wrong to me. Here is a simple counterexample:

Take p_1 = x_1 x_2 x_5, p_2 = x_2 x_3, m = x_1 x_4, and c_\ell = x_2 + x_5, where implicit product means and (\wedge) and + sign means or (\vee). Furthermore let f = p_1 + p2 and res_\beta(g_0) = f + m. Then c_\ell is clearly an f-clause but not a prime one, since x_2 alone is an f-clause.

Now I am not sure if this step is critical to the rest of the proof of Theorem 2, but I am suitably discouraged at this point to put in more salvaging time. This also highlights another complaint of mine, which is the unnecessarily complicated notation res_\beta(g_0), which I feel can be replaced by something more standard and light-handed.

Posted in Uncategorized | Leave a comment

A caffeinated adventure into optimization algorithms and numerical solver libraries in python

Motivated by some optimization problem in quantitative finance as well as simple curiosity, I started looking into some word-of-mouth ML related algorithms and various useful libraries to solve large scale constrained optimization problem. Perhaps my understanding of optimization has deepened over the years, and the newly bought green matcha tea bag has wrought wonder to my head, the documentation in the open source community with regard to the various numerical libraries seemed exceedingly clear. Here I will simply share a lightly annotated laundry list of all the useful tidbits picked up in 2 hours of distracted self-study:

  1. Bayes point algorithm: I first heard about this through a colleague’s work, where the main selling point seems to be the ability to control the model directly through the example weight, rather than feature weight. The definite reference seems to be this 34 page paper, which is clear on a sober afternoon, but can be quite daunting if one simply wants to scan through it to pick up the main idea. I haven’t finished it, but the first 5 pages look quite sensible and promise good return on time spent. My current speculative understanding is that this is a mixture of ideas from support vector machines, which focus on frequentist style optimization problem, and Bayesian inference, which is more expensive but has the nice “anytime” property, meaning even a non-converged model is useful. One thing I found funny was that the authors talked about Cox’s theorem on objective probability; not sure if it is really necessary in a technical paper like this, but authors are certainly allowed to digress a little philosophically.
  2. Bayesian linear regression: I learned this mainly through the namesake wikipedia article. Don’t look at the multivariate version, since it distracts you from the main point. The idea is to have a prior of some distribution for the weights (i.e., linear regressors), which can be conveniently chosen to be Gaussian (a conjugate prior). Then the posterior will have some Gaussian distribution whose mean and variance depend on the data as well as the prior. The formula presented towards the end indeed shows that if the prior is highly concentrated near its mean (high confidence), then the posterior distribution will lean towards the prior mean.
  3. Gauss Newton method: I thought I knew Newton’s method since grade school. Turns out even a veteran like me can get abysmally confused about the distinction between solving an equation and optimizing an objective function. So I spent a few minutes wondering why Newton’s method is considered a second order method, though to be fair, the label of second order has nothing to do with the use of 2nd derivatives, but mainly with the quadratic convergence rate. To those equally uninitiated readers, to a (vector-valued) equation of the form F(x) = y will typically have isolated solutions only when x and y are of the same dimension (assumed both to be in some Euclidean space). Otherwise you get a sub-variety as your solution, which numerical analysts and engineers typically don’t care for. Similarly optimization only makes sense when the objective function is real-valued, rather than vector-valued. By taking the gradient, one converts the later type of problem to the former. In any rate, I started looking up Gauss Newton, which isn’t exactly Newton Ralphson, after seeing this line in the implementation note of the trf library, which by the way, is extremely well written and makes me understand trust region reflective algorithm in one sitting. In it, the author mentions that one can approximate the Hessian matrix of a nonlinear objective function with J^T J. This looked vaguely similar to the typical linear regression exact solution, and in fact is related. As long as the objective function is a sum of squares of individual components, the GN algorithm works, but approximating Hessian with first order derivatives. This obviously can speed up things a lot.
  4. fmin_slsqp function in scipy: I found out about this mainly through this blog post. Upon looking at the implementation on github, I got a bit dismayed since a heavy portion of the code is done using python while and for loops. But I keep telling myself not to prematurely optimize, so maybe this will be my first library of choice. The underlying Sequential Least SQuares Programming approach looks somewhat quadratic.
  5. least_squares implementation in scipy: This one looks more promising in terms of performance, in particular it uses the trf library mentioned below, which promises to be appropriate for large scale problems.
  6. trust region reflective algorithm in scipy: the implementation note for trf above is quite good. The essential idea seems to be to treat a constrained optimization problem locally as a quadratic programming problem and bake the inequality constraints into the objective. Then reshape the region of optimization by the inverse of the diagonal matrix consisting of the distance to the boundaries in each direction (presumably the region is always convex so that this makes sense).
  7. Cauchy point: the solution of the gradient descent multiplier to maximize descent under the constraint that the independent variable doesn’t move beyond a certain radius. This seems related to Wolfe condition, but the latter guarantees convergence of gradient norm to  0, whereas Cauchy is just a way to cheaply optimize a single gradient descent step under constraint.
Posted in Uncategorized | Leave a comment

第一次认真做菜

第一次从厨神到厨霸转型。说笑了。一直觉得生活中唯一过不去的坎就是做饭问题。目前有个非常能干的阿姨代劳,但不能一辈子这样下去呀。所以今天在阿姨的指导下开始向专业级别挑战。前两天也看了写关于厨艺入门方面的知乎文章,详见学做菜如何入门。其中人气第一的回答很多技术细节令我有醍醐灌顶,见之恨晚之感。文中推荐以青椒炒肉丝作为入门菜,因为荤素搭配在火候和刀功方面都有挑战。所以今天我也尝试了一把。

可惜阿姨实在太能干,把青椒事先给我切好了,所以也已成功一半了。外加还给我准备了豆腐干丝。接下来我只要管切肉。平时我对切肉也相当没有技巧。最后往往切成肉丁和肉块并存。主要原因是常温下肉太软,加之到不够快,所以很难切出好的形状。这次多亏阿姨事先给我拿出冷冻的肉,就可以像锯木头一样工整地切。下锅之前,还要把肉放在碗里用稍许料酒,白胡椒,生抽,以及足量的生粉。中餐中生粉的重要性跟油差不多。没了它肉绝对炒不嫩。白胡椒是第一次听说,但是肯定不会难吃。

根据知乎上文,肉炒的时候要热锅冷油。而且建议油放到1/4锅的容量。这有点夸张。所以我只是比平时多加了点油,按照我的健康标准已经过多了。可惜今天锅是热了,油却预热时间过长。结果肉一下锅就劈哩叭啦,没有做到知乎文中的不吸油的状态。不过油放的足够多,最后捞上来也能继续炒别的。文中还说一上来不要去搅肉,但因为我油放的浅了,根本盖不住肉,所以不得不很早就开始搅,当然也是比较轻微的。可惜很晚才意识到可以把大火调到最小,所以有几根肉丝炒的比较煳。最后阿姨过来直接命令出锅,即便有几根肉丝还微带血丝。

把肉直乘到最终装菜的盘子里。花了一点时间把肉和剩下的油分离。接下来炒豆干和肉丝就比较容易了。具体是将锅里剩下的油烧热,然后先将豆干丝倒下去翻炒,然后过大概20秒再放青椒丝。不久之后再放一勺盐,和稍许红糖。后者是为了提高鲜味,因为家里不用耗油等吃了会口干的调料。最后还有一个关键步骤就是尝咸淡。之前放盐总觉得一小勺太多。这次看了知乎文章才知道都是半勺半勺加的,所以直接放了一勺。看来咸淡正合适。

这是炒完了的样子:

20170709_183442.jpg

这是开吃半分钟以后的样子:

20170709_185210.jpg

Posted in Uncategorized | Leave a comment

How do I pronounce my name?

Today as usual I arrived late to company breakfast, due to morning busy work with kids. As soon as I finished ordering, the receptionist asked for my name. So with some nasal stuffiness, I said John. After several repetition, he finally wrote down “Jong”. I had no idea how he came up with this highly unusual spelling. Maybe it is somewhat common among Koreans? Maybe he himself is also Korean? I then corrected his spelling. Just like other receptionists before him, he eventually figured it out and re-uttered my name with a theatrically condescending tone. Outraged by my inability to get it right the first time, I retorted: “how should I pronounce my name?” He then smiled and conceded that he had bad hearing. But surely there is something wrong with the way I handled the nasality of the word “John”. It’s one of those American words that has no dictionary perfect pitch.

My voice tends to lack the resounding quality of a leader, or even a domain expert. I attribute this not to my physical inadequacy, but a general lack of confidence. When I utter a sentence, it usually has not been completely thought out. Even if it has, my mind can vacillate mid-air. Throughout my higher education I have over-emphasized depth and originality of ideas and neglected presentation. It takes considerable deliberation to present a piece of information in a socially convincing manner, no matter how trivial it is. Indeed, great speakers tend to over-sell mundane ideas, over and over, without boring or embarrassing the audience. I might have missed a critical lesson for not going through the brutality of academic job search, which requires an inordinate amount of salesmanship. So as a stage II corporate parasite, I must voluntarily allocate quality time to re-establish my character independence.

Posted in Uncategorized | Leave a comment

多努力分布 (multinoulli distribution)

Today a colleague of mine brought up an interesting mathematical statistics problem. What is the Jensen Shannon divergence between two multinomial distributions? In the discussion, he mentioned that he has reduced the problem to looking at binomial distributions. I misheard it as Bernoulli distribution, and started wondering what’s the name of the multinomial analogue of that. Surely it is not called multinomial distribution, since the latter deals with n objects, rather than 1.

Finally I read about Rademacher distribution, which is nothing but a rescaled version of Bernoulli distribution. Outraged by the excessive naming in mathematics, I started looking up the former curiosity.

According to wikipedia, the correct generalizing nomenclature is categorical distribution, or multinoulli distribution. I have never heard of multinoulli before, but the etymology is self-explanatory and Bernoulli seems respectable enough to coin a new word based on his name. The most natural Chinese translation becomes “how much effort?”. In fact, if one restricts to multinommial where n =1, then it’s the same as multinoulli distribution.

Back to the colleague’s problem: recall Jensen-Shannon is defined by
JSD(u, v) = \sum u_i \log 2u_i / (u_i + v_i) + v_i \log 2v_i / (u_i + v_i). For u = multinomial(n, \alpha) and $\latex v = multinomial(m, \beta)$, JSD doesn’t make sense unless n = m, which we assume. \alpha and \beta are vectors that sum to 1, and both of the same dimension k, otherwise again it doesn’t make sense.

There are n!/(n-k)! different points in the common state space. We can simply calculate the probability under u and v of each point. Taking the case of $k=2$, we are then dealing with the special case of binomial distribution. The calculation eventually reduces to a sum of the form
\sum_i \binom{n}{i} \alpha^i \log (1 + (\alpha / \beta)^i). Mathematica suggests that this is not reducible to closed form in terms of common special functions. I suspected that one could Taylor expand log(1 + \epsilon)  = \epsilon - \epsilon^2 / 2 + \epsilon^3 / 3 - \ldots and get item-wise closed form. It is true that each term in the resulting expansion is summable in closed form, but summing them together becomes just as difficult. In fact with each n, I end up with n terms in this roundabout summation in mathematica. So I am finally convinced that this problem has no analytic solution.

Posted in Uncategorized | Leave a comment

回国有感

乘产假期间回国2周,收益颇多。这里主要谈一下治病方面的进展。不生病不知道体质的重要性,就像不发生金融危机就不重视市场秩序一样。人需要管的东西太多,作为一个有社会责任感的青年,故意放纵的情况是不太多的。多数是疏忽或者优先权错位。

去了一趟云南也看了中医。途中微信感慨了国内医生的敬业,却遭美国朋友嘲笑说中国人不重视锻炼。或许锻炼是比欧美人少,但是也是出于无奈。试想天天单程1.5小时的上班族,外加孩子,有时还要加班,哪有时间去锻炼?吃了几幅药也没见好转。看来锻炼是硬道理。我虽然在微信上严辞反驳说美国80%的肥胖病患率,但回到上海后也开始锻炼了。

重拾高中时的篮球,在校园内95后学弟们的寝室边上篮球场独自练投篮,虽然有老大徒悲伤之感,但球技也颇有长进。对健康却没有信心。如今受感冒咳嗽困扰,胃病似乎缓一缓了,但一些生理表征如隔膜横纹,过量饮食后导致体力衰竭的症状依旧如前,只是消化不良似乎没那么明显了。或许生理年龄到了换一种病的时候了。至于咳嗽,仍然受冷空气和疲劳诱发,似乎略有好转,但很有可能是短期受高强度锻炼兴奋所致。就跟两年前暑假骑自行车上下班是一样。体质虚弱没有好转。

打室外篮球另一好处乃阳光浴。平时我早上9点到10点半左右,可以足足晒1个半小时。国内阳光又没那么刺激,估计皮肤癌概率减半也是有的。今天在网上发现一些偏方也称晒背大有功效。据说长寿村也是晒出来的。这个观点在西医网站也看到过,说是阳光增强抵抗力,促进钙质吸收。之前在美国也略有小试,效果不甚佳,主要还是太在乎晒黑,皮肤癌之类的危害,和对加州阳光的恐惧。此番回去或许会坚持用防晒霜,并注意控制日晒时间,顺便带娃,可以缓解家庭矛盾又健身了。至于运动,还是需要适量。毕竟带娃已经有很大体力支出。最好还是乘身体需要的是有适当锻炼以下。比如一周一到两次球赛。室内运动尽量避免。等身体彻底恢复再考虑肌肉什么的。

如今回美最大的困惑还是如何面对家人在体力方面的要求。孩子9点前不肯去学校,也是逼不出来的。关键下午下班得早,可以乘机带他去公园玩,顺便晒太阳。或直接回家让阿姨接管。好过在路上堵车之苦。太早太晚都不好。关键要走的时间巧。早晨作息还得从长计议。

Posted in Uncategorized | Leave a comment

Insurance policy

Only after my second child was born did I realize that I have been using the less economic insurance plans all these years. Even during years without major medical events, my family members, myself included, make hospital visits pretty frequently. The ideal plan in that situation is EPO. But I have been using a highly subsidized version of PPO sponsored through my company for the past year, since it was the best advertised one and seems to essentially level the deductible with EPO. What I did not understand is that for things like child delivery, EPO charges a flat $250 rate as reported by the Chinese community, whereas the PPO plan accumulates a bill in excess of $20k, which still charges $2k to me after coinsurance. Unfortunately the details are in the fine-prints, and it’s not in the insurance company’s best interest to make them transparent. Lesson learned. I will have to trust Chinese source of information far more than the English ones, because the latter just suck with irrelevant details.

Posted in Uncategorized | Leave a comment