On Friday, November 22nd, the Berkman Center for Internet & Society’s Cooperation group and MIT Center for Civic Media hosted two speakers—Anselm Spoerri and Jisun An—to talk about their research into diversity and contention online. This is a liveblog of those talks authored by Erhardt Graeff, Dalia Othman, Catherine D’Ignazio, Chelsea Barabas, and Nathan Matias.
Anselm Spoerri: Visualizing Controversial and Popular Topics in Wikipedia across Languages
Anselm is a Swiss-born information visualization researcher. He did his PhD at MIT in computational vision, and is now a lecturer and assistant professor at the Rutgers School of Communication and Information. His latest work looks at contention in Wikipedia.
The project he shares with us, on Edit Wars in Wikipedia, presents a fascinating visualization of a dataset prepared by Taha Yasseri and Janos Kertesz of the “most controversial” topics in 10 different language versions of Wikipedia.
Conflicts occur in the peer-production process of Wikipedia and involve “Edit wars” for specific topics. Controversy, as defined by his colleagues, is the number of reverts that an article receives, with some weighting to account for pairs of people who get locked in edit feuds.
Anselm’s colleagues correlated pages on Wikipedia by creating a list of common biographies and objects. Pages that linked to similar common biographies and objects were also considered to be about the same thing.
Anselm shows us the most controversial topics across four different language editions of Wikipedia based on all Wikipedia edit history up to 2010. They broke them down into categories and found that out of the top 100 most controversial topics in the English Wikipedia the largest share at 27% were about Politics, compared to only 15% about Politics in the Spanish Wikipedia, which instead had the largest share at 26% within Sports.
The most controversial single articles:
- English: George W. Bush
- German: Croatia
- French: Segolene Royal
- Spanish: Chile
The only two topics shared among the top ten most controversial in the four language editions: Jesus and Homeopathy. Anselm has found that the common controversial topic between languages ends up being religion.
Anselm has developed a visualization technique called searchCrystal based on a Venn diagram, breaking it into its constituent parts and simplifying the geometry of those parts to show where and how topics overlap across Wikipedia language editions. He puts them into a space where he can represent and compare multiple sets of data simultaneously, and it can be done with any set of ranked lists.
Anselm shows us again using searchCrystal how Jesus and Homeopathy are the only topics among the most controversial in English, French, German, and Spanish Wikipedia editions.
The visualization’s radial mapping correlates an article’s distance from the center to its ranking on the list. And it shows which controversial topics are unique to particular languages and which are shared between language groups.
Anselm goes on to show the intersection of controversial topics in Farsi, Arabic, and Hebrew, finding: Gaza War, Israel, and Islam. And he finds seven pages shared amongst the combined language grouping of Farsi, Hebrew, Arabic, English, French, Spanish, German, Czech, Hungarian, and Romanian: Israel, Adolf Hitler, the Holocaust, God, Atheism, and further down in levels of controversy—Europe and Evolution.
Nathan Matias notes that this offers a way to visualize similarities across rankings and how individual items fit within that.
An Israeli audience member, Uri, offers that he is not surprised by the list of controversial topics Anselm provokes us with. He thinks its ridiculous that it’s the center or most controversial topics throughout these different languages.
Tim Davies responds to the visualization saying that it corresponds well with the global patterns of information flows that take their cue from what mass media covers.
Nathan notes that this may just represent perceived controversy. A troll might try to offend people by directly editing things that are likely to offend people.
Anselm thinks we in the audience are punting the question. He puts forward that this centralizes around anti-semitism and perhaps evidence thereof. He wonders whether we might see coordinated editors behind the scenes manipulating these articles. Extending this, we might find that these reflect the tensions in our own society around these issues. That’s how Anselm reads it.
Anselm argues not that it’s a large number of people that are editing these articles, but rather the small number of editors are representative of society.
Nathan tells a story about a company that vandalized Wikipedia for marketing purposes. The company paid people to insert marketing messages. Within short order the pages were reverted to their original status. There are lots of different kinds of behavior on Wikipedia. There is this question about whether Wikipedia is a proxy for culture or not. There is an uneven distribution of Wikipedia editors around the world. Reflections of who is online, what things are priorities to various commercial and political actors and so on.
Even though reverts are a proxy for controversy, when visualized Anselm argues that it has some suggestive power.
Now Anselm is asking which we think are the most popular pages on Wikipedia. Using monthly page view data from 2006–2007, Anselm shows us the top 100 most visited pages, which reveals lots of topics related to sex.
Anselm views these visualizations as imperfect projections of something going on in society. The connection is in human search behavior and search results. Wikipedia results are in the top search results, and this is an indirect way of seeing what the most popular search topics are.
Anselm half-jokes that Google should donate a lot of money to Wikipedia because their articles’ high search rankings push other competing sites for the same topics to fight for higher search rankings themselves through advertising purchases.
Links to relevant research papers:
- “What is popular on Wikipedia and why?” http://firstmonday.org/ojs/index.php/fm/article/viewArticle/1765
- “Visualizing the overlap between the 100 most visited pages on Wikipedia for September 2006 to January 2007,” http://firstmonday.org/ojs/index.php/fm/article/viewArticle/1764
- “The most controversial topics in Wikipedia: A multilingual and geographical analysis,” http://arxiv.org/abs/1305.5566
Jisun An: Analyzing Social Media for Designing Fit-for-purpose Systems
Jisun is a post-doctoral researcher at the University of Cambridge Computer Laboratory. Nathan Matias introduces her by saying that he first saw her work on news sharing and diverse opinions at Web Science last year, work which has since come up in discussions among those working on Media Cloud. Jisun introduces herself as a computer scientist, who studies human behaviors from social media sites and how her findings bear on the designs of these platforms.
Jisun is interested in media bias and starts by saying that mainstream media sources have a certain bias in selecting what to report and in choosing a slant on the issue. For example, she points out that in the past few months there was some tension between North Korea and South Korea, which her family in South Korea hardly took note of. But Business Insider put out the headline: “North Korea threatens to wipe out south island near border” on an Agence France Press article.
Media bias can be highly problematic for society, Jisun asserts; it has been shown to:
- increase intolerance of dissent
- result in political segregation
- influence political belief (e.g., vote)
Jisun’s research question is “Can social media mitigate the media bias problem?” For this, she looks at two aspects of media bias:
- News exposure in social media: looking into opinion diversity of news in users’ feeds using a large-scale quantitative analysis on Twitter, and
- News sharing: looking into what news articles users share
She used a nearly complete Twitter dataset up to 2009, and looked at 40 English news outlets and the 10 million followers of those outlets. Jisun notes that because this data is from 2009, Twitter was just starting to grow in popularity and so there were less media sources active on the site.
First she inferred the political leanings of news outlets and users by manually tagging the political leanings of media sources. She found 21 media sources to be on the Left, 9 in the center, and 4 to the Right. If user follows media sources in one group then it is assumed that their political leaning matched that political leaning. Out of 10 million users, 7 million users only follow one “side” of media sources, so we can use this to infer their political leanings.
Next was looking at the impact of social interaction. Jisun researched: To what extent do users increase opinion diversity from media explicitly through receiving retweets from other friends? Finding that around 70% of left-leaning users had a potential to be exposed to Right-wing media sources through a friend that follows those sources. This is just a potential though because those users might not retweet those. Looking at the actual user actions of retweeting/sharing content from Right-wing sources, only 20% of users actually were exposed to that content.
Anselm comments that the Left and Right media outlets are not actually balanced. Jisun responds that she accounted for this and that there were more Left-leaning media on Twitter at this time.
Jisun hopes that the application of this research is the ability to identify the bias of media sources through the social graph of Twitter. Previous approaches to this problem focused on content analysis. But Jisun’s work is based on media subscription patterns (i.e. common users between two media sources) and global network positioning.
Based on the spectrum Jisun shows, Anselm asks if it is saying that NPR is the most Leftist outlet because it is furthest toward the Left? Or do people who are the most Left leaning follow NPR?
Jisun responds that her analysis is not based on content but actually based on who follows these outlets. Fox News and The New York Times are the Right and Left landmarks in her analysis respectively.
Tim asks if she has thought about visualizing it in a two-dimensional space to give more variety to the political spectra, rather than just Left–Right. Jisun responds that the work itself is not focused on politics alone, but rather other topics like media and technology.
Jisun believes social media has the potential to diminish media bias. However, that potential will be shaped by different factors. Jisun wants to explore these factors in her work by:
- Identifying media bias
- Looking at exposure to diverse opinions
But do people actually pay attention to opposing positions? We can encourage people to expose themselves to diverse opinions, but not everyone may value such diversity. The theory of selective exposure holds that people tend to seek out political information reinforcing their own political beliefs.
Sharing is one indication that reveals what people act on after the exposure. Assuming more people are using social media and are exposed to more, does it get more diverse? To answer this question, Jisun studied the prevalence of “partisan sharing.”
She looked at Facebook data from the myPersonality Project developed by a psychology lab at the University of Cambridge. It gave users a personality score through a simple quiz through Facebook. It has million of users, many of whom agree to allow their data to be used for research, including 228,064 based in the US alone.
Jisun and her colleagues measured how balanced a user’s news sharing is using the following equation to calculate the “net partisan skew”:
ln(#conservative news + 1) – ln(#liberal news + 1)
Methodology
- Used list of 100 newspapers in US, and matched domains of links shared on Facebook (there were around 62K news articles from 37 news sites)
- Determined the media slant by classifying news outlets as liberal, conversative, or center based on Mondo Times and Shapiro methodology
- Also did topic model classification of news articles using Alchemy API (breaking it down into 12 topics)
- Measured partisanship using the above data according to net partisan skew
They also ran the same methodology over a small sample of Twitter from users recruited through the “Voting Time” website she built to survey users about their news perceptions. She connected it to the BBC’s “Question Time” program, which attracted 71 UK-based Twitter users who shared 2000 political news articles.
Looking only at UK based media sources for domain resolution, she was interested in learning whether the patterns were similar in both the US and UK. She expected to see a bi-modal distribution in partisan sharing bias where the Leftist users share Leftist news and vice-versa.
She found that those users declaring themselves as liberal or conservative users on their Facebook profiles or in the Twitter survey did follow the bi-modal distribution pattern. For comparison, she looked at partisan sharing in the field of entertainment news, find that people do not show the same pattern of selective partisan exposure for these non-political news items. As one shares more political news they become more selective it seems.
Jisun then looked at how net partisan skew changes over time. They hypothesized that the patterns would change during major political events like elections. She found that people actually shared in a more balanced way during election season. One hypothesis around this, she suggests, is that people are sharing information from their opposing viewpoint in a mocking way.
They proceeded to study the perceived bias of news outlets. Through their Voting Time website, they asked people about how they perceived different news outlets. They found that BBC was neutral, The Daily Telegraph was conservative, The Guardian was liberal, and The Sun was conservative. She compared the rankings among self-identifying liberal and conservative users and found users would usually find news outlets of the opposite political leaning as more biased than those of their own political leaning.
One application that comes from this is predicting people’s political leaning by noting their perceived bias of news outlets. They did some regressions on this data, which show this holds up pretty well.
Lastly, they looked at how the behavior was related to political knowledge. Those users that had higher correct answers to political knowledge had a higher net partisan skew. Jisun suggests that partisanship is not always a bad thing. Those who engage in selective exposure tend to be more politically educated and more involved in actual politics, for example.
They found no psychological indicators correlated to the political leaning of users. Sex was correlated with liberal users, but that was it. Jisun’s takeaway here is about how easy (and unfair) it is to stereotype those of particular leanings.
Jisun offers two possible applications of her work for electoral campaigning:
- Targeting undecided voters who change voting patterns during election time
- Recruit political campaigners from the ranks of social media users who are partisan and can share their knowledge of a particular political slant
Jisun ends by reflecting on what motivates news sharing in the first place. From the ego’s perspective, gratification and selective exposure motivate political news sharing. While from the alter’s perspective, socialization, trust, and intimacy are key motivators to political news sharing. Trust and intimacy here represent news sources that the user finds credible or shared by someone that they’re personally close to.
Tim Davies asks about the context and the possibility of a different sharing behavior amongst users during elections. It could be that users are sharing and ‘trashing’ these sources. He also notes the issue of reinforcement bias, whereby people consume multiple conservative or liberal content. Can users actually stand to consume more news? How can we encourage people to see less repetition of content to make for diversity rather than encouraging users to read even more news?
Tim follows up by noting that we can do amazing things with quantitative data but if we don’t pair that with qualitative studies, then we won’t be able to understand the microfoundations of the macro things we are observing.
Catherine D’Ignazio notes that we can try to design systems around forces like selective exposure and partisanship, but that it’s worth figuring out first how to define what the ideal situation looks like. Since what we are doing is working toward a future projection, we need to consider values and what is the right balance. This probably doesn’t look like the center of your spectrum, Catherine adds, more diversity is not necessarily what we want and those values is what I would like to look at.
Jisun responds by resonds by saying that if she could encourage people to share more diversely then it’s likely the whole network will be more balanced. But she’s not sure how we define the ideal situation of diversity, especially around politics. If someone is aware of the existence of other things then that’s something, but Jisun’s not sure how to get to that point.
Links to relevant research papers:
- “Traditional media seen from social media,” https://dl.dropboxusercontent.com/u/2166050/%28p%29%20201305%20websci2013_viz.pdf
- “Why individuals seek diverse opinions (or why they don’t),” https://dl.dropboxusercontent.com/u/2166050/%28p%29%20201305%20websci2013_diversity.pdf
- “Media landscape in Twitter: A world of new conventions and political diversity,” https://dl.dropboxusercontent.com/u/2166050/%28p%29%20201107%20icwsm2011.pdf