Blog: Deep-Text Analysis for Journalism
The challenges faced by traditional media companies in the digital age are well-documented and numerous. Massive layoffs of journalists, shrinking subscription rolls, and termination of the print editions of many smaller newspapers clearly indicate changing consumer preferences. In order to stay relevant, or even solvent, traditional outlets will need to take advantage of the same digital revolution that has led to the expulsion of smaller outlets and the shifting of attention away from traditional forms of media.
One way that these companies might attract more revenue is to sell specifically targeted ad space. Though this is not likely to be a panacea for all that ails these companies, the deep learning concepts behind such technology might lead companies to better understand their audience’s reading habits and preferences and to adjust their content accordingly.
In his talk at ODSC East 2018, Alex Spangher, a data scientist formerly of the New York Times, wrote about his involvement in the newspaper’s deep text modeling project, dubbed “Project Feels,” which had been borne from a desire to quantitatively analyze how readers of the New York Times were responding to different articles.
The origin of the model was fairly straightforward. His department hired workers via Amazon’s Mechanical Turk service to read certain articles and to indicate their emotional response to the article. Using this data, Spangher and his team were able to construct a model that drew conclusions about the likely emotional responses to various words and subjects.
Of course, some articles drew obvious responses, and these articles already appeal to distinct advertisers and readers. Others, however, drew either polarizing or unpredictable responses. In particular, articles involving polarizing figures or legislative battles might be expected to elicit a divergent or complex emotional response. Spangher specifically cites a Times article about Kanye West as an example of this.
Spangher defined “difficulty” for the model by comparing a priori expectations of reader responses with actual responses. At first, the model failed to predict responses to articles like the one shown above, but by training the model with such “difficult” articles, Spangher was able to attune it to words and subjects that would tend to draw more complex responses. Though his model only used two points of demographic data to identify its Turk contractors (zip code and how often they read the New York Times), he suggested that a cross-reference between reader responses and more thorough accounting of reader demography could allow for highly specific advertising targets.
Though the model as it currently exists is based on a static model of analysis, Spangher says he expects that to change in the near future. For a model to be useful, he says, it must make use of continuous active sampling. “We want our realities to reflect [changes in public sentiment], we want our models to be training on dynamic data,” he continued, and added that an ever-changing political atmosphere makes a “typical” response difficult to predict.
As Spangher’s tenure at the Times drew to a conclusion (he took a job with Microsoft), he and his team had fed enough data into the model that it was able to judge typical responses to articles at a ninety-nine percent confidence interval for most of the emotions listed.
The swift pace of technological progress suggests that such advances as dynamic updating will soon be a reality. It is easy to see how companies might misuse advances such as this, obsequiously catering their articles to readers and producing “clickbait” instead of objective news. But taken from a broader, more optimistic view, we can see how deep text learning models like Project Feels give producers of content a greater understanding of what their readers are interested in and how they may react to certain stories. Deep text analysis may also provide insight into changing social and political views as the reactions to words and names change over time. The journalism industry has long been maligned as out-of-touch with its prospective audience, and advances in deep learning may provide struggling papers with much-needed insight into consumer’s wants and needs.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.