"Big data" and Twitter: Two great tastes that go great together? Or are they going to end up more like oil and water or Republicans and Democrats?
Analysis of large, unstructured data sets has now turned to the written word, both literary and the commonplace. University professors are coming out with new research on such topics as "Computing and Visualizing the 19th-Century Literary Genome" and "Quantitative Analysis of Culture Using Millions of Digitized Books," according to a recent story in The New York Times.
Technology and literature are old friends, really. The written word has grown with each new technological advancement: from papyrus to printing press, from typewriter to computer and smartphone. Even now, entrepreneurs are working to skip the human being altogether and have computers write the news.
To me, big data projects such as these on texts are fascinating -- and a bit disconcerting. Fascinating because, as Matthew L. Jockers, a professor at the University of Nebraska, says in the Times, "Traditionally, literary history was done by studying a relative handful of texts. What this technology does is let you see the big picture -- the context in which a writer worked -- on a scale we've never seen before."
On the other hand, computers cannot uncover the meaning or essence of a text or, especially, the experience of reading, which still has to take place in the brain. Technology has enabled the transmission of the written word, through screens, audio, video, etc., but not the understanding. Yet.
From the sublime to the ridiculous, we have another technology project going on at the Library of Congress, which is attempting to capture all of the world's tweets for posterity. So far, that's more than 170 billion messages. Please read the article about the project by the estimable James Gleick, who can do a much better job than I can of describing the absurdity and nobility of the library's quest.
The library's project is actually a case where more technology would be better. Right now, all the data is stored on tape and isn't accessible for online research. Nor has anything actually been done with the data, analytics-wise, so it's all just sitting there. Somehow, I think, that the right big data algorithm applied to such a trove of "information" could uncover the secrets of the universe (or at least, what the connection is between Kate Middleton's portrait and an extreme "deep field" photo).
But that's not all. Perhaps literature and Twitter together can find new levels of meaning and keep the younger generation interested in obscure literary greats at the same time. That's the thesis of Shawna Ross, an Arizona State University professor who recently presented a paper at the Modern Language Association conference about Henry James and new media that argued that James' codification of spoken language in some of his works preceded and even anticipated today's Twitter language of hash tags and 140-character shortenings.
The partnership between technology and writing reminds me of this quotation from a James contemporary, Joseph Conrad, from one of his great works, Under Western Eyes, which is about, among other things, his disillusionment with modern society: "Words, as is well known, are the great foes of reality." What he means is that actual reality is a difficult thing to convey with or from objects as slippery as words. But still, that relationship hasn't stopped anyone from trying, nor will they in the foreseeable future.