2.5 hour struggle with technology

I am not impressed with user-friendliness (or user-hostility) of either of the two major cellphone makers. Last night I had to port contact list from a Galaxy 5s to a newly bought iphone 7+ for my mother-in-law; for the record, I would never buy a luxury good like that for myself. Initially the solution seemed straightforward. The conventional means was to set up a google account (which my MIL hadn’t because of restriction in China), and export and import contacts there. It turned out that the galaxy device wouldn’t allow me to use gmail at all, possibly because it was configured for Chinese users, who have no legit use of google products. This in fact took me a while to discover and confirm, as I tried installing the gmail app from the built-in samsung store, which then prompted me to add an email account, only to be rejected because I don’t have google play store installed. The latter turned out to be unavailable in the samsung store, presumably because samsung didn’t want google to takes its market share of mobile apps. Even an unsophisticated user like me can easily sniff competitions going awry at the expense of users with these design choices.

The next option was to use the sim card as a physical medium of transfer. This again was a dead-end because when I tapped on the menu option on iphone 7+ that says import contact from sim card, I got instantly worm-holed back to the home screen without any explanation (or apology). Could this be a case of incompatibility with foreign sim card (previously used on a Chinese device)? I also tried switching the iphone locale to en/us, as iphones were notorious for incomplete feature implementation in secondary locales, but still had no luck. The complaint for localization bugs will be fodder for a later thread. After researching on the web about this turned up no relevant results, I was briefly flummoxed.

The saving grace was the realization that the iphone 7+ did carry a scanty few contacts from the old galaxy phone. Initially I thought it was due to an incomplete exportation to sim, but after switching the sim hosts several times, and consulting with my family members, I started looking at my MIL’s newly created gmail account (which is inaccessible on galaxy). Then it became clear that those few contacts came from an earlier porting attempt by my wife. So a third solution emerged: try loading the contact list directly into the gmail account, and then hopefully it will automatically sync with the iphone.

The next episode simply proves the adage that bad things all come at once. First it took me a while to figure out how to access the local file system on the android: there turned out to be an app just for that, fortunately already installed. It took me no time to locate the file storing the contact list. But how should I send it to other devices? Gmail is out of question. This left me with basically only one option: use wechat. In a moment of unequivocal stupidity, I logged out of my MIL’s wechat account and got into mine, and sent the file as an attachment to myself there. The goal was to retrieve the file on another mobile device/macbook so that it could eventually be uploaded to gmail. I then started checking my personal android phone for the sent file, but it was running soon out of battery. I connected it to the my mac air and made sure that the battery charging mode was on (indeed the data transfer mode was not supported any more by the itune version on my mac air, which was only 4 years old!). But the battery turned out to be really depleted at that point, despite the indicator showing 30% before shutting down. After a few failed attempt to reboot without instantly shutting down, I decided to plug it into a wall socket and simply wait. Meanwhile, I had the ingenious idea of sending the file to my wife’s android phone. It was no longer possible for me to log back into my MIL’s wechat account since she forgot her username and password, and my wife, being the only person knowledgeable in this matter, was upstairs breastfeeding or something and could not be disturbed. For about 10 minutes, I tried to use wechat on my mac air directly, only to find out that it required 2d bar code scanning from a mobile device, which was out of battery at the moment. Even though I eventually succeeded in this regard, the sent file was not showing up in any self-conversation tab, on either my phone or the laptop. So finally I forwarded the file to my wife’s phone, and it appeared instantly on her device’s end. Could that be a bug in wechat regarding self-conversation? Only John von Neumann knows. The rest was happy ending, though to be fair I could have spent that 2.5 hours babysitting my younger one or pretended to do some math in my head.

Posted in Uncategorized | Leave a comment

How much it costs to raise a kid

Today my wife made the comment that it is actually easier on the parents to send the kids to extracurricular classes than having them stay home, despite the extra financial cost. So I was curious enough to do the following back of the envelope calculation. Assuming that we send one kid out every working hour during the week, that is 40 hours a week, so for 18 years, assuming $50 an hour, this amounts to:
echo “40 * 52 * 18 * 50” | bc

that is a whopping 1.8 million dollars, something only the top 5% of this country can afford. And this is just one kid, and non-weekend working hours. With weekend nannies, diapers, and other material cost, even if we lower the hourly rate to $25, I think the figure is still easily exceeding $1 million. So how on earth can people in this country afford to have a kid, let alone multiple ones?

Posted in Uncategorized | 3 Comments

Doing research of any kind is insurmountably difficult

I have lived in the research world for a while now, more precisely 12 years. My journey has been an extremely inert one. There have been countless times when I thought I am onto something, and it turned out to be fluke, bug, or some other uninteresting outcome. While in academia, I at least had the leisure of choosing the problem I wanted to pursue, some of which might not be at the center of the community spotlight, hence could yield to persistent trying, in industry, the competition is laid out in plain sight, and the metrics against which success is measured are few. The competition not only comes from contemporary peers, but also historical knowledge accumulation, which is true in academic settings also. What is more frustrating is that one often gets committed into a no-brainer project, only to find out later that it is a hole from which one can never crawl out in a wholesome way. This is the key difference between academic pursuit and industrial pursuit. Although in the former case, one also has coauthor’s trust at stake sometimes.

In any event, I have presently been stuck in such a hole for the better part of 4 months. The goal is not even very lofty, but a mere refactoring and space saving gimmick that doesn’t even qualify as a new idea. It turned out however that all those savings come at a cost, namely metrics are going down, despite all kinds of variations I have tried. Being an honest person, I do not wish to resort to the mercy of the team to launch the project, however, it is also distasteful to let it go to waste, since another colleague has been with me throughout this “wonderful” journey and I have a responsibility for not letting him down. While many other folks are anxiously waiting for this bottleneck project to settle down, I continue to bang my head against the wall, especially given how slowly things move within our organization. This may be the most opportune time to fuss about work.

So then I thought about Abraham Lincoln, and how he overcame an insurmountable amount of personal and political difficulties, only to be shot dead in the end. But the beautiful part of his story is that he carried all such weight with a smile of grace. ‘Tis I shall emulate, and prod along with animalistic persistence despite ever dwindling peer respect for my intelligence, prospect for promotion, and the opportunity to change the world and shit before I succumb to natural decay. Eventually the organization will figure out the right place for me to grow or rot, and all I should care about is the next local optimum to pursue.

Posted in Uncategorized | Leave a comment

Reading, innovation, and meaning of life

While staying home on paternity leave, I had more time to ponder the meaning of life, away from hectic programming day job. This is coupled by my grandma’s accidental fall in the bathroom, and the less than optimistic prognosis that her rib bones were fractured and heart and lung got infected as a result. I think even God appeared to me one night to give consolation, since this is a justifiably depressing time, despite the smoothness of the newborn. In any case, reflecting on my first 9 months at my current job, one trap I repeatedly fell in was that deep down, I wanted to innovate and make big news so bad, that I lost sight of the lifelong pursuit of learning. As a programmer, there are many ways to absorb old and new technology. The whole industry evolves around making learning more accessible to both the insiders and outsiders. Maybe the abundance of resource pushed me into the other extreme, by completing shutting my brain off from learning and focusing on continuous philosophizing and hypothesis testing. This break allowed me to realize this as a critical vice that would hinder my long term productivity.

So having regained some intellectual energy, I revisited a branch of mathematics that I detested as a graduate student, namely analytic number theory, as partially motivated by Terence Tao’s most recent blog post on the Bombieri heuristic. Yesterday I managed to get a systematic education on the Mobius function. Today I started reading his earlier post on Goldston-Pintz-Yildirim, Motohashi-Pintz, and YT Zhang’s result. I got tripped by a seemingly innocent estimate, that the Hardy-Littlewood constant relevant for the prime constellation conjecture is bounded away from 0. It turned out to be an elementary consequence of the Prime Number Theorem, which I have always held in awe and dared not to apply it to real questions of interest. It sounded like going through the whole post would be both rewarding and challenging, but I have set my mind to do so, and hopefully come up with some followup learning items. After all, number theory is an exact science and a mediocre mind like mine should still be able to penetrate it, given enough volition. Hopefully I will then find some common ground with past grad school friends and borrow analytic ideas to solve my own problems in Lie theory and probability. Thank you God for the latest revelation.

Posted in Uncategorized | Leave a comment

Living, smart and strong

The title of this post is supposed to be a parody of the book “thinking, fast and slow”, in case you have picked up the connection (and I don’t blame you). Well, you may argue, smart isn’t exactly the opposite of strong. Indeed, we tend to associate smart people with strength, though not necessarily vice versa. I argue here that these two are indeed competing attributes in how we choose to live in many situations.
As a case in point, I just spent 2 hours today trying to fix a fridge problem that was estimated to take only about half an hour. Every summer since the last one, I run into the same fridge mishap: the air duct for the refrigerating compartment gets blocked by ice, accumulated due to excessive external heat that induces the freezer to keep sending cold air over. As we enter the first phase of summer last month, I started noticing the top level of the fridge basically no different from room temperature. While my wife tried to deceive herself by insisting that it was actually cooling, I trusted my tactile common sense and went straight into handyman mode. Sure enough, the back of the fridge cooling circuitry was covered with frost. While I wasn’t completely sure that the air duct was blocked this time, since I had trouble prying open the plastic back cover to get a good view (oblivious of how I did it last year), I blew the hair dryer on high over the entire channel without hesitation, as soon as I found the extension cord. So the difficult part seems over, and I was ready to move on with life, or rather more routine chores like dishwashing, only to realize that putting the shelves and drawers back in place was combinatorially impossible, at least initially. I didn’t exactly recall how many shelves were supposed to go between the bottom three drawers, but somehow decided that there must be at least one, since having three top shelves above the drawers felt like enough. Thus advised, I proceeded to try combinations of inserting the fiberglass shelf plate at various groove levels, only to realize that either the top drawer has to be tightly squeezed between two plates, or the door won’t close. Enraged, I started removing stuff from the shelves and try a different combination. I guess this was the moment where I decided to let strength take over me entirely, since not only did I engage in a shelf-shoving spree, causing damage to the grooves, I became bitterly proud of how much tedium and brute-force I was able and willing to take on unfazed. In the end, the brute force did pay off, due to sheer divine mercy: I became aware of my assumption error that there needs to be a shelf between the drawers. Had I done some soul searching for smartness though, I would have saved the damaged, perhaps also the frustration and time. I have heard a supposedly very smart friend in grad school instructing his students to bang their heads against a wall in the face of a difficult math problem. This is consolatory as perhaps even top tier smart pants have their sweaty brainless moment. But it still stings to reflect on my past confrontation with difficult situations: I almost always lose my cool and opt for strength instead of smartness.

Posted in Uncategorized | Leave a comment

Chinese Medicine

Over the past few years, I have increasingly noticed how the left side of my body seems dysfunctional at times. For instance, I started growing grey hair exclusively on one side of my head. My left leg is constantly enervated and sensitive to acupuncture points. Between my two kidneys I often feel a sense of asymmetry. While for the most part I couldn’t feel the existence of my right kidney, my left kidney often gets a tingling or even a shredding feeling, for lack of better terms. I have had kidney stone during early graduate school about 7 years ago. It was a tiny piece that came out naturally through urination in the end, but definitely wreaked havoc when I woke up to enormous pain around the abdomen. The doctors and nurses at the emergency room made a quick and accurate diagnosis, but left me without water for half a day just to be completely sure that it was indeed kidney stone, while I lay on the gurney in morpheme-muffled pain. Had I been given water earlier, the acute pain would have been washed away through urination, and spared of the morpheme. But that incidence was not the earliest manifestation of my one-sided malady; kidney weakness had occurred to me even during elementary school. The influence of my father, and surrounding herbal medical culture in China certainly made me more cognizant of the role of kidney in my overall health. The notion of selling one’s own kidney for a living that arose in cinematic works always made me cringe. An English-Chinese bilingual anthology of marvelous anecdotes meant as a ESL reading also mentioned that the adrenal gland shrinks irreversibly as one ages. But it was not until more recently that I start to take kidney health more seriously.

Today my wife suggested that I should give moxibustion a try. This is one of the few oriental treatments she subscribes too, mainly in the context of Gynecology. Thus for 15 dollars we bought a moxibustion box burner together with the moxa incense. With her help, I then lit the moxa inside the burner and fastened the whole thing next to my ShenShu acupuncture point, which is at the same height as the belly button, but on the back, 1.5 chinese inches away from the spine. For one brief moment my left kidney seemed to get a jump start of fresh blood. But after that there was no apparent physiological response, possibly because the cloth pocket insulated too much heat from my skin. Overall the procedure seems pretty harmless, and the proclaimed effect of increasing blood flow to the organs actually makes scientific sense. Whether or not the moxa is doing anything is unclear, but the heat certainly helps. I plan to stick to the routine 2-3 times a week and assess the benefit.

Western medical literature claims almost all positive effects of moxibustion documented in past studies are due to publication bias. While there is definitely truth to that, one often overlooks the fact that western medicine has pretty simple-minded metrics to gauge success, through something as mechanical as p-value. It is nearly impossible to experiment on long term effects, just like in my own work we kept chasing short term measurable gains, but rarely look at long term benefits to the users. Those latter objectives are usually reserved for top executives, so there is much less science involved.

Posted in Uncategorized | Leave a comment

Sanity check Vowpal Wabbit

One reason that I have been working with linear regressions for years and still haven’t been able to move on to more glorious machine learning models like neural networks is that even though the underlying idea is simple, it’s virtually impossible to sanity check the correctness of a linear regression library with naked eyes.

Back at yahoo labs, I wrote down some steps on how to verify that VW and the Weka wrapper of the liblinear library are in fact producing the same results. Being a good corporate citizen, however, I did not bring that knowledge with me when I left. Now that I could care much less about liblinear, VW still is a great tool to carry around. Its sheer speed of training seems unmatched so far on a single machine. So here I will focus on how to sanity check results from VW (v7.3), which would help the user gain a better understanding of its myriad flags as well.

  1. BFGS is the training mode of choice
  2. Issue the following command to train (sim is my own suffix, denoting simulated data):

    vw –bfgs –cache –cache_file=cache.sim -d out-00000-of-00001 –readable_model=readable.sim –passes=10 –termination=0.0000000001 –loss_function=squared –bit_precision=22 –final_regressor=model.sim

  3. test on the original training data set; there are two kinds of prediction output flags, –predictions and –raw_predictions. The former seems to always truncate final prediction > 1.0 to just 1.0.

    vw -d out-00000-of-00001 -i model.sim –raw_predictions=pred.sim –testonly

  4. Create a dummy data file consisting of one feature per row, including the empty string feature denoting the constant term:

    cat out-00000-of-00001 | python invert_feats.py > feats.sim

  5. Generate const + feature weight for each feature using feats.sim:

    vw -d feats.sim -i model.sim –raw_predictions=invert.sim –testonly

  6. Subtract const from feature weights:

    python subtract_const.py invert.sim invert.vw

  7. Concatenate pred.sim and raw data side by side, and feed into a model applier python script, and eyeball agreement of the vw (raw) predictions and python ones.

    paste pred.sim out-00000-of-00001 | python apply_wts.py invert.sim | less

  8. The way I (re)discovered about the raw_predictions flag is through the useful feature called audit. 

    vw -d feats.sim -i model.sim –testonly –audit | less

    It allows you to look at the data value and model value of each feature used in each input example:

    0.068238 20;riversdale%20rd
    Constant:3261788:1:0.130461 w^riversdale%20rd:78240:1:-0.0622233
    0.146634 36;opera%20mini
    Constant:3261788:1:0.130461 w^opera%20mini:2082032:1:0.0161729
    281.882654 562.090180 6 6.0 36.0000 0.1466 2
    0.154288 14;uoskirt
    Constant:3261788:1:0.130461 w^uoskirt:3187096:1:0.0238272

As you can see, the constant term each gets an example value of 1, and its model value is 0.130461. The “w^” token is the namspace of the features that I put in my training data. The big number after the name of the feature is the hash value. There are several sources of confusion with the above steps:

  1. The specification of model file is via -f in training, and -i in test. -f stands for final_regressor and -i stands for initial_regressor.
  2. –readable is not that useful, unless one can reproduce the hashing function used by VW. Let me know if you can implement it in python!!
  3. One shortcoming of bfgs mode is that –raw_predictions can be quite different from –predictions; the former can have a huge prediction loss since it doesn’t truncate the prediction beyond [0,1]. For sanity check against other LR library sgd is the better mode to use. For underdetermined system, don’t expect the feature weights to get even close between two LR libraries. But one should expect the prediction scores to be quite close when both have converged reasonably, since the prediction vector is the closest point in an affine space to the label vector, which is unique.

Below I share the python scripts used to uniquely parse out the features and reconstruct the raw prediction values:

  1. ## invert_feats.py
    #!/usr/bin/env python
    import sys,re,math
    # expect input to be a vw data file
    def upd(d, k, v=1):
      if k in d:
        d[k] += v
        d[k] = v
    if __name__ == "__main__":
      feats = {}
      for line in sys.stdin:
        tmp = line.strip('\t\r\n ').split('|')[1]
        tmp2 = tmp.split(' ')
        # exclude the "w " namespace part
        for t in tmp2[1:]:
          upd(feats, t.split(':')[0], 1)
      for k,v in feats.items():
        print '%d 1 %d;%s|w %s:1.0'%(v,v,k,k)
      print '0 1 0;|w'
  2. ## apply_wts.py
    #!/usr/bin/env python
    import sys,re,math
    # apply weights to a vw example
    if __name__ == "__main__":
      wts_file = sys.argv[1]
      wts = {}
      with open(wts_file,'r') as f:
        for line in f.readlines():
          tmp = line.strip('\r\t\n ').split(' ')
          wts[tmp[1].split(';')[1]] = float(tmp[0])
      for k in wts:
        if k != '':
          wts[k] -= wts['']
      # do paste pred.sim out-00000-of-00001 | python apply_wts.py invert.sim | less
      for line in sys.stdin:
        tmp = line.strip('\r\t\n ').split('|')
        tmp2 = tmp[-1].split(' ')
        res = wts['']
        for t in tmp2[1:]:
          s = t.split(':')
          res += wts[s[0]] * float(s[1])
        tmp3 = tmp[0].split('\t')
        print res, '|||', tmp3[0], '|||', tmp3[1], '|||', tmp[-1], '|||', {k:wts[k] for k in [t.split(':')[0] for t in tmp2[1:]]}
  3. ## subtract_const.py
    #!/usr/bin/env python
    import sys,re,math
    if __name__ == '__main__':
      feat_pred_file = sys.argv[1]
      feat_wt_file = sys.argv[2]  # do not write to the same file in case of confusion
      feats = {}
      freqs = {}
      with open(feat_pred_file, 'r') as f:
        for line in f.readlines():
          tmp = line.strip('\r\t\n ').split(' ')
          wt = float(tmp[0])
          feat = tmp[1].split(';')[1]
          feats[feat] = wt
          freqs[feat] = int(tmp[1].split(';')[0])
        const = feats['']
        for k,v in feats.items():
          if k != '':
            feats[k] -= const
      with open(feat_wt_file, 'w') as f:
        txt = '\n'.join('%.10f %d;%s'%(v, freqs[k], k) for k,v in feats.items())
Posted in Uncategorized | Leave a comment