bookstore and data science

I had the leisure this evening to visit one of the local East Asian language book stores with my wife after dinner. I was expecting all Korean texts, but fortunately most of them were in Chinese. At first I felt a bit underwhelmed by the variety displayed in front of my eyes. The most obvious titles were either children’s books, the Chinese Readers’ digest magazines, or some suspense novels adapted from real world politics in China. I usually do not spend much time with children’s books at this advanced age, I am not exactly a literary person, and from past experience the last category usually tries to stretch a small amount of insider’s information or well-researched material over a 300 to 400 page span. Hence the feeling of underwhelmingness. As I shuffled towards the deeper end of the store, however, some classical period literary titles caught my attention. I quickly picked up the Romance of three Kingdom, which regretfully I haven’t even read once in my life (as I recall from elementary school, some children’s novelist guest speaker mentioned in her speech to us that every time she read such classic as she aged something new came up to her; in contrast I naturally feel a bit under-educated). The story is about Zhang Song’s proposal to his lord Liu Zhang on how to deflect enemies, as well as his subsequent visit to Cao^2 in Xu Du to consummate his plan. The text that once read like Morse code now bore striking semblance to colloquial Chinese, at least of the variety found in Louis Cha’s work.
After giving up at one page and a half, I began to examine nearby books, as a greedy hoarder would do to his jewelry collections, except I did not own nor wished to own any of the books there. Besides the usual array of classics, I saw some interesting titles about African history, as well as Kang Youwei’s prose collection re-examining Kungfucius’ teaching. The editor succinctly summarized the point of this collection as questioning the existence of pre-historical kings and sages extensively quoted in the analects, the real motivation being setting up theological background for pushing forward his parliamentary monarchistic agenda amidst strong conservative royalist oppositions. At that point I proudly decided to not continue with his old academic-style rambling and sought some fresh air in the next aisle my wife was working on. There, besides some of the Douban lewdness, I was pleasantly surprised by a full range of new poetry assemblage, not the least of which was a rare anthology of Guo Moruo. Guo’s reputation plummeted in recent years after his tail-wagging during the Mao era was revealed by various CCP historians and contemporaries. Even his academic achievement in deciphering the oracle born script was downplayed as a form of intellectual monopoly. Though I am not qualified to judge his latter caliber, I would yield that his poetic style did seem more lucid than some of his contemporaries, like Wen Yiduo. I could totally see why Mao would have been a fan. The creative injection of English words into otherwise modern Chinese free style, as well as the display of familiarity with western literature and history, marked an exceptional intellect of his time.
So what does the experience so far have to do with data science you might wonder? When talking about data these days one typically imagines sensor data, user feedback, financial transaction, gaming statistics, etc. Indeed those generate the bulk of what data miners and statisticians deal with day-to-day. Data generated through conscious creative effort probably accounts for a very small slice. Nonetheless the latter certainly has more profound value and is vitally important for backing up human civilization at large. As a scholar, an old adage was to expand the realm of curiosity in all walks of the library (by abuse, libraire which means bookstores in French). Even in a highly specialized department as pure mathematics, I was told by various practitioners that that was what distinguished a good researcher from a bad one decades down. As data science has revolutionized our society and how businesses should best be conducted, taking for example microsoft’s online controlled experiment group, it’s perhaps also time to update this old wisdom. Professor Jiao Jianxin once made the following pedagogical remark about research: students in the current Chinese university system typically have good mastery of breadth, but the lack of depth prevents them from tackling truly original problems. If the defining character of life is to trace out a impressionable lineage of research footprint, often choosing the right direction to pursue is much more important than investing every waking second soaked in “data”. Unlike a sensor or a lab rat (not the kind with PhD degrees), a person’s contribution should be measured by the conscious low entropy data output. There are at least two advantages to adopting this metric: the succinct representation of knowledge ensures ease of transfer from one person or one generation to the next; the potential to spawn other important data blackholes is much greater than near random noise is capable. An example would be the conception of Turing machine in the early 20th century. Many of the subsequent development of electronic computing can and/or must be traced back to this event. While the concept itself can fit easily within 10 to 20 pages in the most verbose form, its symbiosis with the level of scientific advances back then was similar to the execution of a trapdoor fucntion, that resulted in massive creative watershed that would otherwise be locked away.
So back to the bookstore. It would be a mistake in the data era to infer from its existence the mandate that everyone be educated like an encyclopedia. Granted there are legendary memory champions who exhaust their textual resource at an early age, I doubt such grievance still exists in any well-connected part of the world today, aside from devious thirst for insider knowledge. A bookstore should instead function as a search engine, or recreational zoo. The latter faculty is perhaps the most relevant one these days, though people might argue physical intimacy with books fosters true bibliophilia and engraving of knowledge. I am a big fan of paper books, but it has been precisely this physical possessiveness that breeds the unruly habit of aimless random walk in the informaze. One possible healthy learning profile is T-shaped, in that breadth and depth meet at a single junction, but do not span a surface. Only through necessity and age-afforded leisure should this T be gradually thickened to solid square.


About aquazorcarson

math PhD at Stanford, studying probability
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s