A Harvard researcher is using analytics and big-data to find anachronistic language in Downton Abbey and other historical TV shows and movies.
Benjamin Schmidt, a visiting graduate fellow at the Cultural Observatory at Harvard, writes the Prochronisms blog, comparing the dialogue of historical TV shows and movies -– primarily Downton Abbey and Mad Men –- to Internet corpuses of historical texts, mainly the 20-million-book Google Books library.
By finding words and phrases in the dialogue that do not commonly appear in contemporary books and periodicals, Schmidt is able to find anachronisms that previously required a linguistics degree and a keen ear to spot. These are "prochronisms" -- anachronisms appearing too early in history.
Sometimes, the result is just amusing trivia. For example, one of the major story lines of last season's Downton Abbey dealt with the black market in World War I England. Scmidt discovered that the phrase "black market," used by the characters on the show many times, appeared occasionally in literature beginning in the 18th century, but only became popular in World War II. The characters should have said "contraband" instead.
Sometimes, the insights are more significant. Schmidt's analysis of the movie Lincoln shows meticulous attention to detail in some areas of the screenplay -- the character of Lincoln uses the word "flubdubs" from the 19th century, virtually unknown today.
However, in political discussions, the characters of Lincoln use anachronistic language, such as "bipartisanship" and "the Democratic Process," which did not enter the language until decades later. The anachronisms reveal the intent of the screenwriter, Schmidt told me in a phone interview. "He tried to make it a contemporary political drama," Schmidt said. "He tried to make Congress in the 1860s more like Congress of today."
Another Lincoln movie from last year, Abraham Lincoln, Vampire Hunter, contains anachronistic language such as "suicide missions," and "behind enemy lines," said Schmidt, after pausing to consult his notes. ("I don't want to attribute anything in Abraham Lincoln, Vampire Hunter to Abraham Lincoln vs. Zombies, which is a significantly worse movie," he explained.)
To spot anachronistic language, Schmidt starts with the text of the dialogue from a TV show or movie, which he usually obtains in advance using closed-captioning scripts. Schmidt then compares that text to the searchable database of Google Books Ngrams. Schmidt's application reveals the frequency of phrases appearing in books of the period; words and phrases that hadn't been invented yet -- or that were only used rarely -- get flagged as anachronisms.
Schmidt does follow-up research using databases of historical newspapers online; the Library of Congress has a database of Civil War newspapers, while Proquest has The New York Times and Los Angeles Times. Newspapers are helpful sources because words and phrases are likely to appear there earlier than in books. Schmidt also consults the Oxford English Dictionary. The OED states when a word or phrase first appeared, but it's of limited use because it won't say when a word or phrase became common, Schmidt said.
The kind of linguistic analysis Schmidt does has deeper value. It can help authenticate historical documents. And it can help answer the basic linguistic question of how language changes over time -- whether changes come from children, the upper class, or elsewhere.
Schmidt's work is a small part of the transformation going through the fields of history and literature, driven by big-data. As text databases like Google Books come online, scholars are finding new mathematical tools. "It's a very exciting time, but it's the Wild West," Schmidt said. "Although there has been computational approaches to history for decades, they looked at small data sets."
It's a transformation familiar to the world of business, as disciplines such as marketing and customer management, previously driven by experience and judgment, are now being revolutionized by big-data. In both business and scholarship, analytics doesn't exclude human judgment; but rather, analytics combined with human judgment is better than either is separately.
Related posts:
Ways to Help IT Manage Big-Data at Lower Cost
How Big Data Can Be Good for Business & Useful to You
Nurturing Business Analytics Talent
— Mitch Wagner 


, Editor in Chief, Internet Evolution