Google Ngram: Mining for data in two centuries of literature

As fraught as our relationship with Google is, we in the book industry have to admit that some elements of their digitizing campaign do provide for some fascinating insights into the literary world. Their Books Ngram viewer allows you to map the usage of particular words throughout the entire collection of books in their digital library. Here’s a simple one showing the prevalence of the words ‘men’ and ‘women’ from 1800-2000:

David McCandless (of Information is Beautiful fame) demonstrates a few other way to dig into the data in his blog post here:

Astute data analysts will surely criticise the way in which the data is mined (in fact, the generalisations are the problem so perhaps it’s the lack of mining that’s the issue) but it’s fascinating nonetheless. And a great way to pass a Friday afternoon.


