I read The Numerati some time ago and never got around to writing a post about it, so here is my long overdue summary. Stephen Baker is a former BusinessWeek writer with interests in technology; his latest book, Final Jeopardy: Man vs Machine and the Quest to Know Everything was published last month. The topic of The Numerati, according to the book jacket, is "a new math intelligentsia [who] is devising ways to dissect our every move [using the trail of data we leave on the Internet] and predict, with stunning accuracy, what we will do next, [in order] to manipulate our behavior."
Whoever wrote the book jacket got a bit carried away ("the mathematical modeling of humanity", really?) but the book itself makes an important contribution. It is divided in seven chapters: Worker, Shopper, Voter, Blogger, Terrorist, Patient and Lover; in each, Baker describes what he learned from extensive discussions with experts in the field. To be honest, I am not quite sure I belong to his intended audience (who seems to be the majority of the population who doesn't practice data-mining nor math modeling, and needs to be educated about the potential and pitfalls of data), although Baker did attend the INFORMS annual meeting two years ago and autographed some of his books to operations researchers. On the other hand, I don't know if the people who would benefit most from his research will be sufficiently interested in data-mining to buy a whole book on it - I can see how they would gain from an article in their favorite magazine, but a book is a tougher sell. Thankfully, The Numerati is now out on paperback and Kindle, so people can get the book relatively cheaply.
I found myself a bit frustrated at times by the book's high-level descriptions, since I understand the technical part enough to want to know more about the complexities faced by the experts Baker interviewed, but the level of technicality was excellent for a layperson interested in learning more. I particularly enjoyed reading the issues faced by Google's Adsense with respect to spam blogs, or splogs, (in the "blogger" chapter); as an update, Google changed the way it ranks search results just last month to try to fight content farms.
Also, the "patient" chapter was fascinating from beginning to end; it focused on networked gadgets that can help hospital patients or people in poor health. A scientist at Intel Research Lab whom Baker interviewed "sees sensors eventually recording and building statistical models of almost every aspect of our behavior. They'll track our pathways in the house, the rhythm of our gait." But Baker also points out that taking advantage of this technology is not as easy as it sounds. In my favorite anecdote, on p.158 of the hardcover edition, he explains: "One woman, researchers were startled to see, gained eight pounds between bedtime and breakfast. A dangerous accumulation of fluids? Time to call an ambulance? No. Her little dog had jumped on the bed and slept with her."
However, the potential of data analysis is undeniable: according to the Intel scientist, "specialists studying the actor Michael J Fox in his old TV shows can detect the onset of Parkinson's years before Fox himself knew he had it." (p.165) In another startling analysis, described p.177, researchers at University College London studied the manuscripts that prizewinning novelist Iris Murdoch left behind when she died of Alzheimer's, and were able to identify a curve followed by her use of language in her books, growing more complex until the height of her career and then falling off. While Baker sometimes oversells his case by picturing a distant future where our lives will be dominated by data-mining, rather than the more relevant (for readers) near- to medium- term, the studies he quotes are very interesting.
Finally, the "lover" chapter has an unexpected application to the resumes of job candidates: "according to BusinessWeek, 94 percent of US corporations ask for electronic resumes. They use software to sift through them, picking out a selection of "finalists" for human managers to consider." (p.195) Baker comments: "The point is that when we want to be found... we must make ourselves intelligible to machines. We need good page rank. We must fit ourselves to algorithms."