Google Ngram: Mining for data in two centuries of literature

As fraught as our relationship with Google is, we in the book industry have to admit that some elements of their digitizing campaign do provide for some fascinating insights into the literary world. Their Books Ngram viewer allows you to map the usage of particular words throughout the entire collection of books in their digital library. Here’s a simple one showing the prevalence of the words ‘men’ and ‘women’ from 1800-2000:

David McCandless (of Information is Beautiful fame) demonstrates a few other way to dig into the data in his blog post here: http://www.informationisbeautiful.net/2010/google-ngram/

Astute data analysts will surely criticise the way in which the data is mined (in fact, the generalisations are the problem so perhaps it’s the lack of mining that’s the issue) but it’s fascinating nonetheless. And a great way to pass a Friday afternoon.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s