Alisa Miller’s TED Talk brilliantly illustrates what news industry observers have been warning for years: Our news diet is distorted. We get very little news about places outside the United States, and that amount dwindles further when we remove Iraq from the equation. If you look at our supply of news from places outside the United States that the United States is not directly involved in, the effect is even more pronounced.

Miller points out that demand for international news has actually increased in recent years. It’s beyond clear that in this global era, we need to know what’s happening elsewhere. But we’re also living in an age where we’re overwhelmed daily by the amount of information and content seeking our attention. The thought of measuring what we consume probably feels like too heavy a burden to all but the most intrepid a graduate student, just as tracking our daily physical activity was little to no fun before personal metric tools like NikePlus and FitBit came to market.

This task cannot be left to the individual. So, the Center for Civic Media, under the leadership of Ethan Zuckerman, is embarking on a project to build the tools to empower the individual, and the news providers themselves, to see at a glance what they’re getting and what they’re missing in their daily consumption. We seek to provide a nutritional label for your news diet.

The idea of monitoring our information diet has been out there in some form for some time. Miller herself created a label to protect from overconsumption of infotainment in her Kindle Single, Media Makeover (page 9). Most recently, Eric Newton of the Knight Foundation published a column in the Miami Herald warning that we are just as likely to indulge in comfort news as we do comfort food. Clay Johnson named his personal blog InfoVegan, in the hopes that people might begin to monitor their information consumption with the same degree of care and attention as vegans give their actual diet. Clay also mocks up what such a nutrition label might look like, left.

It’s Alive

Of course, we’re not talking about replicating the small, monochrome design itself. We’d like to build nutrition labels for news that are realtime (or close to it) and interactive. They should reflect an individual person’s news consumption, or an individual news provider’s complete offerings. You should be able to compare labels and “nutrients” across many news providers and news formats (print, video, podcast, etc.).

Regulation vs. Information

The purpose of this system, codenamed NewsRDI, is not to attempt to regulate anyone’s information consumption. No, I think everyone working on it is of the “read everything, link everywhere” mind. Rather, the goal is to make information about the news available to individuals who would like to benefit from it. The rollout of FDA nutrition labels on food packaging in 1990 did not force individuals to eat differently, but it did provide critical dietary information for those consumers who sought it.

One can imagine any number of applications for similar data about our information consumption. Individuals could use the information to achieve personal goals. Someone looking to broaden their world view might sign up for a weekly email that alerts them when less than 25% of the stories they read are internationally-focused. Someone with a habit of over-consuming information could monitor total information intake, just as some individuals watch their caloric intake. NewsRDI could let me know when 80% of my news consumption consists of inside-the-beltway gossip, and then suggest some denser articles based on my interests.

NewsRDI could also help consumers and suppliers of news by analyzing news sources rather than a consumer’s mixed diet. A major national paper could proudly boast its range of coverage on a dynamically-updated news box meter. A niche trade journal could highlight its depth in a given field on a front-page sticker.

But these are all end-user applications. First, we have to design a system that quickly and accurately codes the day’s news.

Our Roadmap

With this dataset, we’re going to attempt to automate classification of the topics of individual stories, and then analyze the aggregate. This will give us a sense of how many stories are published about each topic, as well as their frequency and where they appear. We’ll build on existing research in natural language processing, text classification, and semi-automated machine learning, not to mention media research.

Once we have this data, we’ll build tools to visualize it and better inform both producers and consumers. Our primary goal is to find out if the makeup of the news has changed recently, and in what way, and to look for patterns in our collective news consumption.

Our secondary goal, working off of stellar existing research, is to see if we can verify that international news coverage has been subjugated in the drive towards hyper-local coverage. If true, this poses a real challenge to our understanding of the world around us, even as global issues like climate change, immigration, and world trade become ever more relevant to our lives.

How We’ll Do It

Many academic communications papers begin like this:
“We locked six graduate students in a room for two weeks and had them count the last six months of stories about flossing.”

Reading variations of this sentence over and over again inspired Ethan Zuckerman and Hal Roberts to build MediaCloud at the Berkman Center for Internet & Society. At its most basic, MediaCloud automatically collects a corpus of news articles and blog posts for media analysis:

Media Cloud collects stories from 30,000 feeds belonging to 17,000 media sources from a combination of mainstream and new media sources. It stores both the full html and the extracted story text from about 50,000 stories per day from those media sources. It converts that story text into per story word counts that it makes publicly available as daily data dumps. We use those word counts to perform a variety of modes of analysis of the media ecosystem, including word clouds, clustering, mapping, and a variety of regular and custom reports written by the media cloud team.

Overview of Media Cloud Methods

We’ll start with this corpus of daily news and build from there.

Who’s “we”?

Good question. In addition to the aforementioned Ethan Zuckerman, who’s brainchild I’m tinkering with, we round out the A-team of civic technologists with:

Nathan Matias, that rare combination of technical wizard and philosophical observer
Rahul Bhargava, who already has an incredible portfolio of civic tools to his name
Dan Schultz, Knight News Challenge victor and software ninja-before-ninja-was-an-overused-adjective
Myself (Matt Stempeck)

We’ll also be relying heavily on the expertise of our colleagues at the MIT Center for Civic Media as well as the Berkman Center for Internet & Society.

Let us know what you think, or if you come across anything that might be useful!