Dispatches from #NICAR14: Holding algorithms accountable | MIT Center for Civic Media

I'm currently in Baltimore for the 2014 conference for NICAR (National Institute of Computer Assisted Reporting). In this series, I'll be liveblogging the various talks and workshops I attend — keep in mind, this is by no means exhaustive coverage of all the cool stuff going on at the conference. For more, check out Chrys Wu's index of slides, tutorials, links, and tools or follow #nicar14 on Twitter. Read on for my summary of a panel discussion on holding algorithms accountable. Thanks to Nathaniel Lash for his help liveblogging.

Chase Davis is an assistant editor on the Interactive News desk at The New York Times. Previously, he led the data journalism and engineering teams at the Center for Investigative Reporting and was an investigative/data reporter at several newspapers, where he specialized in politics and campaigns. He is a graduate of the Missouri School of Journalism, where he now teaches a course in advanced data journalism.

Nicholas Diakopoulos is a Tow Fellow at the Columbia University School of Journalism where he researches data and computational journalism. He received his Ph.D. in Computer Science from the School of Interactive Computing at Georgia Tech with a focus on human-computer interaction. His specific expertise spans computational media applications relating to data visualization, social computing, and the news.

Jeremy Singer-Vine is a reporter and computer programmer at the Wall Street Journal, where he gathers, analyzes, and visualizes data.

Frank Pasquale has written extensively on access to data in the health care, internet, & finance industries. He has been a Visiting Fellow at Princeton’s CITP & a Visiting Professor at Yale Law School. He has testified before the Judiciary Committee with the General Counsels of Google, Microsoft, and Yahoo. His book "The Black Box Society: The Secret Algorithms Behind Money & Information" will be published by Harvard University Press this fall.

Chase kicks off the panel with an observation he’s made over years of working in data science. “When you’re talking about algorithms used for decision making and predictive modeling, models you build, by definition, have to be imperfect.” He says these models aren’t trying to catch all these edge cases, but to capture the “general gist of the data you’re working with.” If you focus too much on edge cases, it leads to overfitting, which can reduce the value of the model as the whole.

“When you consider the models that people are using are imperfect and they’re becoming increasingly important,” Chase says. “Journalists have an important role in exposing those models and holding them accountable.”

Now, we meet the panelists: first up is Jeremy, who worked on a project for the Wall Street Journal on online pricing algorithms used by Staples and other online vendors. Next is Nicholas, who has been researching how algorithms might be discriminatory, what kind of mistakes they can make, etc — essentially, “algorithms as sources of power.” Frank, who teaches at the Maryland Law School, is interested in helping journalists understand the legal aspects of data and “how the law can be changed to allow more of this work [data journalism] to be done.”

Chase says maybe algorithms aren’t completely to blame. From a reporting perspective, there’s a split responsibility between the algorithm and the institution that chooses to trust the algorithm. He asks the panel: should we focus more on exposing the algorithmic layer or the institutional layer?

Frank brings up an example of S&P failing to update their data set promptly — most of the reporting was focused on the failure of the algorithm. Nicholas throws out a bunch of questions reporters should ask about algorithms. “How are they making mistakes? Who do those mistakes affect? Who are the stakeholders? How might algorithms be censoring or discriminatory?” This covers a wide swath of reporting, from the most abstract features of algorithms down to the nitty gritty. 

“If you’re going to talk about responsibility — and it’s tricky — it’s all about the human level,” says Jeremy.

Nicholas adds a corollary to the correlation doesn’t equal causation argument. “Correlation doesn’t equal intent. Just because there’s a correlation, doesn’t mean a designer sat down and intended for that to happen.” He noted that the predictive models used by the Chicago Police could show a correlation with race, but that doesn’t necessarily reflect the intent of the analysts. He says journalists have to be careful in claiming there’s a specific intent to an observed algorithm. “Really understanding the design process behind algorithms can shed some light into their intents.”

Frank brings up the issue of racial bias in online ad targeting. In many of these cases, the algorithms aren’t intentionally biased, but are merely leveraging past data. “To the extent that all our content is personalized, can we say anything?” He brings up an article by Nathan Newman on the subject. Jeremy brings us ProPublica’s Message Machine, which investigated email microtargeting during the 2012 presidential campaign.

Chase asks the panel what level of technical mastery was necessary to adequately report on algorithms. Jeremy says algorithms is the correct word to describe this, but it can be a daunting word. For the Staples story, his team found that the variation in price could be explained by the distance between your zipcode and the nearest competitor store (e.g. an OfficeMax). They had to have the technical expertise to scrape the site and get the data, but it “wasn’t anything you couldn’t learn on the job.” At first, the correlation wasn’t immediately apparently — there were multiple variables that seemed to be correlated in the price variations. Jeremy admits, “The way we ended up homing in on competitive stories was traditional reporting.”

Nicholas says that an understanding of how the Internet works and how cached information is delivered is crucial in being able to make sense of these algorithms. He doesn’t think we’ve reached the point of being able to FOIA source code, but envisions a not-so-distant future in which it’ll be essential for reporters to be able to interpret code.

Frank brings up a report on JP Morgan's London Whale trading. Internally, their risk models were on high alert, but they bluffed past the regulators. He thinks journalists should question the quality of the data, the quality of the inputs, and the quality of the assumptions of the model. As models become more complex, it will take more of journalists to find the underlying stories within.

Jeremy says algorithm-based pieces are no different from traditional pieces in that they should be newsworthy and be part of a broader story. Nicholas says we should be cognizant of who’s being affected by algorithms and think about social justice angles — this provides a good benchmark for newsworthiness.

Now, there’s a question from the audience about the Netflix recommendation algorithm challenge. Even when the source code was made public, nobody could really make sense of it. How can we translate these algorithms into human-understandable terms.

Jeremy says techniques of machine-learning are less embedded with intent than prescriptive rules. With the Staples algorithm, it was a matter of calculating the distance, seeing if it’s less than 20 miles, then doing this or doing that. This makes sense in English, but as we report on more complex algorithms, expressing these concepts will get harder and harder.

If that wasn’t bad enough, Nicholas also brings up how algorithms change over time. “They’re moving targets, and it makes it very difficult to parse what they’re doing.” He says we might want to focus our energies on platforms that are constantly changing.

Chase draws an analogy between penetrating the black box of an algorithm and the black box of an institution. How do we navigate these things? Nicholas brings up the trade secret exemptions to FOIA law that ensures to opaqueness of the algorithms the government might use through third-parties. If these technologies are patented, there’s a chance that we can start to understand them, but “at its core, it’s very opaque.” Reporters can start to connect the dots, but “correlation is a weak form of connection.”

Jeremy says that the inability to explain an algorithm may be a story in itself. Frank agrees, bringing up a the example of company personality tests. There’s very little direct relationship between your answers and your productivity as an employee; they’re essentially matching large data sets of what the best employees in the past have said.

Now we move into the Q&A portion of the session. One woman from the audience asks about access to algorithms. “There needs to be a big push towards auditing algorithms that are in use. Do you guys know of any excellent things that the FTC is doing in that direction?”

Frank would recommend that journalists closely follow data broking workshops run by the FTC, from speakers like Edith Ramirez who recognize the dangers of black boxes. Nicholas brings up specific niche industries that involve rigorous auditing in algorithms, such as medical devices.

Another audience member draws an analogy between algorithms and snow plows. How can we balance between “explaining how these snow plows work and holding people’s feet to the fire for not having enough snow plows?” Frank notes that the New York Times had done a good job with snow plows, in a literal sense.

The next question is on how reporters and editors can be held accountable for stories that are surfaced via social media. Nicholas suggests that, as more and more newsrooms utilize machine learning techniques to find stories, they need to implement transparency policies to explain exactly how these algorithms work. Jeremy says the question we have to ask ourselves is, is there ever a point where we discover something about an algorithm that we shouldn’t reveal, to prevent it from being gamed? Frank says maybe we should expose as many of these as possible, so algorithms will be forced to utilize more robust classifiers.

Finally, an audience member asks a question about artificial intelligence. “Who do we hold responsible when AI reaches the point where the creator really couldn’t have predicted the output?” Nicholas brings up his research on Google and Bing autocompletion, specifically how the algorithm could be used for defamation. These search engines enjoy legal protection in the U.S., but overseas, this isn’t the case — there are many lawsuits where people are going after companies for creating these algorithms.

Frank brings up the example of high-frequency trading and how there’s push towards implementing some kind of human intervention when these “automated systems go out of whack.” Nicholas tells us not to take algorithms at face value — to get critical — and adjudicating the power of systems can be a core function of journalists moving forward. For journalists interesting in delving in these topics, Jeremy says a great way to start thinking about it is to learn more about the HTTP spec and how cookies work.