Political thoughts from China trip

I am conflicted internally about the pros and cons of the so-called totalitarian Chinese political system. At a lunch one day one of my Australian colleagues suggested that totalitarianism has been a historical feature throughout ancient Chinese history, rather than a modern innovation. Indeed unification has been a recurring theme, only interleaved by periods of turmoil and unrest, which have always been considered undesirable for society at large. From a Chinese perspective the undesirability of a fragmented China is more than obvious: warlords engage in military campaigns against each other, with collateral damage of massive casualties and  and cultural and economic disruptions. The question is whether a unified nation-state is always good to its citizens. From historical accounts, that has consistently been the case. The possibility of partial and dishonest historians under the intellectual coercion of ruling emperors always exists, but we have learned many incriminating facts about ruling families throughout the centuries, and if Western schools were to take confucianism seriously, which they do to give the impression of pro-ethnic diversity stance, then they should treat the historians’ account with similar reverence, simply because those were continuations of the Confucian traditions, however self-perpetuating they may look.

One of the most contentious aspects of modern China is its suppression of freedom of press, which the west values highly, as a safeguard against corruption, and other forms of institutional abuse and degradation. China today is not shy about the fact that certain inconvenient facts are suppressed at the state level. The Great Firewall is an example, though the deterrence is not absolute. I’d like to draw some parallels with the early United States democratic history. Back during the founding period, slaves were not allowed to vote individually and yet boost the voting weights of slave-owners in the Southern plantation states. There is little justification aside from the fact that blacks were not recognized as legit humans back then by the prevailing Southern white community. Same goes with certain white indentured servants, Native Americans, women, and non-land-owners in general. In that sense, democracy is not absolute, and did not arise from the ideal of human equality, but through a compromise of self-serving motives. At least on paper this does not sound as great as how one might be taught at school, unless one belongs to the ruling class at the time. The voting is also not universal, but slightly hierarchical, in which general populations had to vote for representatives, who in turn cast votes for the POTUS and other important federal positions. This may be a result of technicality, but unfortunately (from the liberal perspective) the system is highly inert to change. From a conservative point of view, the two-stage voting system helps giving smaller states a comparable voice against large ones, and I would say prevent population from over-concentrating around the mega-cities, a problem China is struggling with today with varying degree of success.

The Chinese system today certainly does not embrace universal vote either, but chooses a rather hierarchical way of promoting leadership, hybridized with some top down successor-cherrypicking process. Because of the ostensibly longer feedback loops, and partial insulation from people’s wills, transparency is potentially compromised, and the prospect of corruption is viewed as higher. Indeed during the anti-corruption campaign spear-headed by Xi as well as earlier, several mega-corruption cases have been uncovered, with personal wealth reaching the 10s of billions, seating some political figureheads squarely on the FORBE’s list. American corruptions are usually of smaller magnitude, at least in terms relative to national income average or other measures of average wealth. Reports of senatorial net worth look paltry in comparison. Arguably, however, the extent of corruption goes beyond mere illicit wealth acquisition. Members of congress are not barred from inside trading, a rule that is meant to prevent conflict of interest, ironically. I do not know how the Chinese statutes stand on this matter. Spiritually, the party leaders should frown upon such capitalistic evil-doings.

In the media-dominated modern era, transparency, which is a cornerstone of a functioning democracy, is largely ensured by the media. That’s why China’s decision to ban certain websites is considered highly incongruent to western values. However, I would argue that juxtaposing American democracy during statutory slavery prior to 1865 and socio-economic slavery thereafter, the decision to ban certain media outlets is not worthy of much criticism. It is unreasonable to ask a 5 year old to reach the mental maturity of an adult by exposing her to pornography; countries also take time to mature to be ready for certain political forms. Cancerous material such as drug advertisement and violent pornography should arguably be outlawed completely. If slavery abolition was insisted during the US founding period, the British would still be ruling the continent, with her unjust taxation. If the Chinese government were to relax its media stronghold, unverifiable figures of governmental atrocities would surely instigate revolt by the under-informed mob, that would disrupt everyday lives of middle class families and free market economy. So peasantry/farming population in modern China played a similar role as slavery in the US, with an important distinction that peasants ascend socio-economic ladders through painful but viable migration waves to the cities, as did my grandparents generation, as well as millions of migrant workers who leave their children behind with grandparents in pursuit of better living. The blacks are struggling with basic civil rights even today, and gerrymandering is virtually a collective consciousness at the state congressional level, at least in North Carolina and possibly a few others. Granted the problem US is trying to solve has its unique challenges, mainly ethnic diversity. The Chinese challenges may be even more noteworthy, namely high population density. China also has ethnic diversity, mainly concentrated near the borders. It would be somewhat inconceivable to let a self-serving democracy run freely in the country and hope for a win-win happy compromise to come out. Anthropology tells us that usually there is only one winner in the end. The preservation of minority genomes is largely a conquerors’ hobby.

As the US confronts the spread of fake news, China’s state censorship seems to claim a rare but decisive victory. The climate denialism had no chance to go mainstream, and superstition driven policies find no home in the statutes. The US is developing its own defense mechanism against these manufactured truths, however it is in no position to counter religions, one of its core founding values. The process is somewhat similar to anti-trust laws, where corporate conglomerates were banned from collusions, despite operating under free market environment. Platforms like facebook and google receive injunctions to internally regulate media content, effectively injecting editorial discretions aligned with political agendas, however unquestionable the motives. Thus one should not be alarmed by the unsettling rise of fake news, but draw analogies with the legislative traditions of the past, and rest assured that the system will heal itself in due time. The Chinese government seems to approach the problem from another end, namely more strict starting point, gradually relaxed towards tolerance of dissension. I don’t think the government will ever tolerate fake news per se, since it is viewed as intrinsically bad, like mosquitos. Thus there is likely no formidable legislative adaption like what the US is going through right now. This does not mean the name fake news would not be mentioned many times in the written laws, but they will be more proactive rather than reactive. The disadvantage is that the system may be less resilient to future attacks of unforeseen kind. But there is always a tradeoff between explore and exploit, and the US path is definitely the more exploratory of the two. In terms of relative advantages, China can easily piggyback on US exploratory findings, and steer away from the failure cases to devote more energy towards key legislative improvements. The key however is not to copy the entire system, but only the end results and reconstruct it through more efficient means. In machine learning parlance, this is called distilled learning, and is known to improve results dramatically.

Posted in Uncategorized | Leave a comment

A case calling for revocation of tenure in mathematics

Frank Calegari just wrote a blog post entitled “The ABC conjecture has (still) not been proved”. At the center of the spotlight is Mr. Shinichi Mochizuki 望月新一
who posted on arxiv a supposed proof of the conjecture about 5 years ago. Now 5 years have gone by and expert opinions officially converged that there is no proof (save a few referees for the journal RIMS). What’s really infuriating is that Shinichi did not spend the least effort explaining his proof in any detail, but left the public in a perpetual state of suspense, completely capitalizing on his former reputation as a somewhat productive mathematician and student of the great Gerd Faltings. This is in stark contrast to S.T. Yau’s approach of the Poincare conjecture, where the latter was all too eager to explain someone else’s proof. Math papers are already opaque enough to be accessible to general audience, to be impenetrable by experts for half a decade is simply a disservice to humanity. Perhaps what Shinichi did accomplish is raising awareness among the math and related community the importance of explaining their work in comprehensible details.

Update: to be clear, Frank’s post did not even suggest tenure revocation; and I am not seriously suggesting it either, given how much worse other academics have been compared to mathematians (post upcoming on a recent encounter). There have also been precedences of mathematicians whose works were discredited for a long time before accepted as correct, such as Heegner, and to a lesser extent De Branges (though people are far more certain about his recent claim to the Riemann).

Posted in Uncategorized | Leave a comment

Random thoughts about politics I

I got cornered at a lunch conversation with some friends on issues related to North Korea. While a communist-ideology sympathizer at heart, I argued that North Korea isn’t as bad as western media has portrayed. But Daniel pointed out acrimoniously that NK lacks freedom of press etc, that makes it a completely loser to other major powers in the world stage today. So this is my attempt to regain some confidence in my original belief that an alternative political system to the western one is possible and potentially better.

  1. the 4 year election cycle in the US makes the leaders highly volatile, and coupled with the bicameral legislative system, it has suffered severe credibility crisis in recent decades. This resulted in a series of catastrophic foreign policy deals, including the failure to bring NK to the nuclear deal table.
  2. NK’s grain output per capita is higher than India; similar the life expectancy is slightly higher, despite the fact that the latter has been a “true” democracy for decades, and the western powers have sought to starve off NK for decades through many rounds of sanctions.
  3. Given all the precedence of states bombarded and annihilated by the US, including Libya, Iraq, and soon to be Syria, there is no reason for a sovereign state like NK not to resist at all cost and build up its own nuclear arsenal. US’s goal is simply to spread the ideal of democracy all over the world, similar to how Christians tried to spread their gospel around the world, in the form of crusade when necessary. Over time the abstract ideology becomes more institutionalized and outstrips the importance of its original motivation of helping humanity. In short, a bug in code is often worse than no code at all.
  4. The idea of nuclear containment is itself a highly unfair proposition. Why should smaller countries be banned from building an arsenal just like the big ones? Of course one could argue the fewer players have the technology, the less likely we will see armageddon, and indeed if I were a leader of a big country I would argue the same. But from the small players’ perspective, it’s completely unfair. Also if a country like Pakistan could own nuclear capacity simply because it’s a democracy, why shouldn’t a country with a better peace record and surveillance not be allowed to, simply because its founding ideology isn’t aligned with the major power?
  5. It is conceited to suggest that an ideology embraced by a whole nation is of no virtue at all, compared to the prevailing political system of the day. True, pure communism may not work in practice, but neither does true capitalism. The real world consists of a mixture of the two extreme visions, where capable workers are sufficiently motivated, and overall population is guaranteed an above-subsistence level of living. Communism has served as a check for workers’ rights during the cold war era, and income inequality has since skyrocketed after the collapse of the Berlin wall. At the same time, the prosperity of capitalist societies has motivated communist nation states to adapt themselves to improve living standards of their people. The greed of capitalists will go unchecked without some form of menace from the grass-root working class. This has been pointed out in the greater leveler book by Scheidel. Without western sanctions and embargoes, it’s very hard to predict what the welfare state of former and current communist states would have been.
  6. North Korea’s major disasters occurred during the 1990’s under the rule of Kim Jong-Il, who was a stuttering incapable ruler. The new leader appears much more capable and has no reason not to care about the welfare of his own population, even for the selfish reason (as all leaders would) of self-preservation of power. He is certainly a high stake gambler, but to do anything less would signal weakness to the outside powers like US who is seeking every convenient opportunity to dismantle the regime in the name of democratic freedom, much like how a well-meaning doctor wants to destroy the cancer cell, forgetting the true objective should be to save the patient. Given US’s track record in the middle east, no sane person would believe it can do anything better than status quo through its brute force intervention.

To be continued after I read more on the subject..

Posted in Uncategorized | Leave a comment

A typo in a book that cost me a day

I have been insatiably reading the computational complexity book by Sanjeev Arora and Boaz Barak, which from the table of contents and review from big guns like Avi Wigderson, Mike Sisper, seems like the holy bible and culmination of all complexity research over the past 3 decades, despite being called a beginning graduate textbook. Eventually I decided to skip ahead to chapter 19 on error correcting code and hardness amplification, but got seriously stumbled by the description of Berlekamp-Welch algoithm (a name I cannot remember without the mnemonic resemblance to “Berkeley campus”). It turns out the author wrote 2d and d instead of 2d + d/2 and d + d/2 for the degrees of the univariate polynomials in the bivariate graph interpolating polynomials; I have been scratching my head trying to understand how one could solve a linear system of 4d equations with 3d + 2 unknowns. But thanks to this wonderful lecture notes by MIT , I was able to reconstruct the correct parameters. Another thing I wish the book had was in-line reference to papers/books where the proofs/algorithms were taken from, but I understand that’s a pretty time-consuming task as well.

Posted in Uncategorized | Leave a comment

Buggy day

Being a software engineer means that it’s a constant struggle between depression and complacence. At the peak of either extremes, there is also the need for periodic introspection. Hands-on people, as are typical of engineers, tend to be negligent of more nurturing and routine things in life. These things are not as rewarding or exciting, but can make a difference in shaping us as human beings, how we perceive ourselves, how we want to chart the course of life, and our influence on people all around. One excuse we often give ourselves is that there is no time. Indeed, as a working class member, I have to take kids to school, do all the usual chores, as well as stay in office from 9 to 5, in addition to 1.5 hours on the road. But if we do not spend some quality time daily to nurture our soul, we run into the risk of leading a completely meaningless life, devoid of substance and purpose, and not only is this leading to a sad terminal state, but may also interfere traumatically as we drag our bodies towards that end state.

As we age, the amount of competitive pressure around us naturally rises. Also on the rise is the sense that our experience has enabled us to stay ahead of the game indefinitely. Even in my early 30s, I can feel that I am gravitating towards the same mental trap that millions ahead of me have experienced, the notion that a superpower, in the form of automated intelligence at my fingertip,  has dawned upon me that makes me invincible for life. Quite the contrary, this infatuation with and over-romanticizing of superpower is the death knell of the biological and sympathetic side of a person. The ease with which the said superpower is acquired should be caution enough against over-reliance on it. Unfortunately people often find the easy ways in life and follow their so-called passion without considering the context. The obvious thing to do is not always the right thing to do. The context is also very important. During the medieval age, a scribe is a very respectable job that has the status of a professor in today’s day and age, while nowadays a typist cannot even make a living, because virtually everyone is capable of that kind of skill. What truly distinguishes a person is some unique skill rarely seen in the mass. Unfortunately, the sheer multiplicative quantity of human beings has rendered such redeeming anomalies less and less likely over time, especially under the wave of globalization. So we often dial back to a second order competitive advantage, fostering a good habit by being persistent about it. This is a necessity, but alone does not lead to the age-old pursuit of life-long happiness. After all humans are conditioned to appreciate change (hopefully in the positive direction) in fortune, rather than an eternal possession thereof.

Habit that takes the form of robotic and thoughtless actions tends to degrade us as humans and contribute little to our characters. It merely enslaves our mind and numbs our sense of righteousness and ability to articulate. Habit that involves creative, presentational, and perhaps even mildly confrontational episodes tends to provide more utility as we chart through the difficult course of inter-personal relational quagmire at work, at home, and in the society at large. A scientifically trivial act of diary writing, for instance, juggles our mind in a spontaneous direction, with the collision of diction and emotion freeing us from the confine of algorithmic precision, a constraint imposed by our silicon nemesis to accelerate the process of intellectual polarization and squeeze the last breath out of an osmotic soul. Thus human life must be variegated, unanticipated, serendipitous, and original. To follow any predestined path is to defy the will of the Creator, which leads to misery by definition.


Posted in Uncategorized | Leave a comment

On Norbert Blum’s latest proof that P != NP

I first read about this on hacker news, which I encourage more people to subscribe for quality content in the tech/science space. Many people have raised doubts (including Scott Aaronson whose Democritus blog I used to follow) about the validity of the proof, but few seem to be able to point out any flaw to date. I found it generally well-written in the beginning, except some of the preliminary results seem too basic, such as the conversion between CNF and DNF, such as Lemma 1 and Lemma 2. The writing style reminds me of my REU project during summer 2007, when I wrote my first original research papers: the balance between non-triviality and clarity is often a newbie’s struggle.

In any case, I was happy with most of the arguments until the proof of Theorem 2. I am not questioning the correctness of its statement since I haven’t had time to try to find a counterexample or come up with a proof independently. The assertion, “By construction, no variable can be removed from c_\ell without destroying this property”, however, does seem wrong to me. Here is a simple counterexample:

Take p_1 = x_1 x_2 x_5, p_2 = x_2 x_3, m = x_1 x_4, and c_\ell = x_2 + x_5, where implicit product means and (\wedge) and + sign means or (\vee). Furthermore let f = p_1 + p2 and res_\beta(g_0) = f + m. Then c_\ell is clearly an f-clause but not a prime one, since x_2 alone is an f-clause.

Now I am not sure if this step is critical to the rest of the proof of Theorem 2, but I am suitably discouraged at this point to put in more salvaging time. This also highlights another complaint of mine, which is the unnecessarily complicated notation res_\beta(g_0), which I feel can be replaced by something more standard and light-handed.

Posted in Uncategorized | Leave a comment

A caffeinated adventure into optimization algorithms and numerical solver libraries in python

Motivated by some optimization problem in quantitative finance as well as simple curiosity, I started looking into some word-of-mouth ML related algorithms and various useful libraries to solve large scale constrained optimization problem. Perhaps my understanding of optimization has deepened over the years, and the newly bought green matcha tea bag has wrought wonder to my head, the documentation in the open source community with regard to the various numerical libraries seemed exceedingly clear. Here I will simply share a lightly annotated laundry list of all the useful tidbits picked up in 2 hours of distracted self-study:

  1. Bayes point algorithm: I first heard about this through a colleague’s work, where the main selling point seems to be the ability to control the model directly through the example weight, rather than feature weight. The definite reference seems to be this 34 page paper, which is clear on a sober afternoon, but can be quite daunting if one simply wants to scan through it to pick up the main idea. I haven’t finished it, but the first 5 pages look quite sensible and promise good return on time spent. My current speculative understanding is that this is a mixture of ideas from support vector machines, which focus on frequentist style optimization problem, and Bayesian inference, which is more expensive but has the nice “anytime” property, meaning even a non-converged model is useful. One thing I found funny was that the authors talked about Cox’s theorem on objective probability; not sure if it is really necessary in a technical paper like this, but authors are certainly allowed to digress a little philosophically.
  2. Bayesian linear regression: I learned this mainly through the namesake wikipedia article. Don’t look at the multivariate version, since it distracts you from the main point. The idea is to have a prior of some distribution for the weights (i.e., linear regressors), which can be conveniently chosen to be Gaussian (a conjugate prior). Then the posterior will have some Gaussian distribution whose mean and variance depend on the data as well as the prior. The formula presented towards the end indeed shows that if the prior is highly concentrated near its mean (high confidence), then the posterior distribution will lean towards the prior mean.
  3. Gauss Newton method: I thought I knew Newton’s method since grade school. Turns out even a veteran like me can get abysmally confused about the distinction between solving an equation and optimizing an objective function. So I spent a few minutes wondering why Newton’s method is considered a second order method, though to be fair, the label of second order has nothing to do with the use of 2nd derivatives, but mainly with the quadratic convergence rate. To those equally uninitiated readers, to a (vector-valued) equation of the form F(x) = y will typically have isolated solutions only when x and y are of the same dimension (assumed both to be in some Euclidean space). Otherwise you get a sub-variety as your solution, which numerical analysts and engineers typically don’t care for. Similarly optimization only makes sense when the objective function is real-valued, rather than vector-valued. By taking the gradient, one converts the later type of problem to the former. In any rate, I started looking up Gauss Newton, which isn’t exactly Newton Ralphson, after seeing this line in the implementation note of the trf library, which by the way, is extremely well written and makes me understand trust region reflective algorithm in one sitting. In it, the author mentions that one can approximate the Hessian matrix of a nonlinear objective function with J^T J. This looked vaguely similar to the typical linear regression exact solution, and in fact is related. As long as the objective function is a sum of squares of individual components, the GN algorithm works, but approximating Hessian with first order derivatives. This obviously can speed up things a lot.
  4. fmin_slsqp function in scipy: I found out about this mainly through this blog post. Upon looking at the implementation on github, I got a bit dismayed since a heavy portion of the code is done using python while and for loops. But I keep telling myself not to prematurely optimize, so maybe this will be my first library of choice. The underlying Sequential Least SQuares Programming approach looks somewhat quadratic.
  5. least_squares implementation in scipy: This one looks more promising in terms of performance, in particular it uses the trf library mentioned below, which promises to be appropriate for large scale problems.
  6. trust region reflective algorithm in scipy: the implementation note for trf above is quite good. The essential idea seems to be to treat a constrained optimization problem locally as a quadratic programming problem and bake the inequality constraints into the objective. Then reshape the region of optimization by the inverse of the diagonal matrix consisting of the distance to the boundaries in each direction (presumably the region is always convex so that this makes sense).
  7. Cauchy point: the solution of the gradient descent multiplier to maximize descent under the constraint that the independent variable doesn’t move beyond a certain radius. This seems related to Wolfe condition, but the latter guarantees convergence of gradient norm to  0, whereas Cauchy is just a way to cheaply optimize a single gradient descent step under constraint.
Posted in Uncategorized | Leave a comment