Data, Experiments, and Social Networks: Social Media and Behavioral Economics
At the Center for Civic Media and the Berkman Center for Internet and Society, Nathan designs and researches civic technologies for cooperation across diversity. At the Berkman Center, he applies data analysis and design to the topics of peer-based social technologies, civic engagement, journalism, gender diversity, and creative learning.
Nathan's current projects include large scale research on community building online. In the summer of 2015, Nathan will be a PhD intern at the Microsoft Research Social Media Collective. A full project list is at natematias.com.
Nathan regularly liveblogs talks and events. He also publishes data journalism with the Guardian Datablog and PBS IdeaLab. He coordinated the Media Lab Festival of Learning in 2012 and 2013.
Before MIT, Nathan completed an MA in English literature at the University of Cambridge, where he was a Davies Jackson scholar. In earlier years, he was Riddick Scholar and Hugh Cannon Memorial Scholar at the American Institute of Parliamentarians. He won the Ted Nelson award at ACM Hypertext 2005 with a work of tangible scholarly hypermedia. He facilitated #1book140, The Atlantic's Twitter book club from 2012-2014, and was an intern at Microsoft Research Fuse Labs in the summer of 2013.
Data, Experiments, and Social Networks: Social Media and Behavioral Economics
I'm here at the Harvard Law School to blog their conference on "Social Media and Behavioral Economics." This is the opening session, which sets the stage for the rest of the day's conversation.
- Session One: Data, Experiments, and Social Networks
- Session Two: Organ Donation, Power Consumption, and Social Choice
- Session Three: Malleability vs Serendipity in Social Choice
Cass Sunstein sets the context talking about mechanisms and policy tools. He talks about the basic, simple principle: if you know about something, you are more likely to do it. Secondly, people often miss out on important benefits to their lives if they don't know about it. In many contexts, important features of social situations are invisible. Next, we know that people use heuristics (mental rules of thumb) in making decisions, and this can lead to bias. Social networks can correct the hazards of availability bias by emphasizing statistical probabilities or amplifying people's awareness of things that are unlikely. We also know that people dislike losses from the status quo and will fight to prevent them. For example, he talks about the large impact of charging people a nominal fee for plastic bags.
Cass talks about default rules as an instrument of effecting behaviour. He tells us about a White House conference on disclosure policies. At the conference, people were offered bean sprout and soy cheese sandwiches, with the opportunity to click a button and change their selection. 80% of those people selected the default option. People didn't want it, but they were defaulted into the choice. Fortunately, this was an experiment, and people got the sandwiches. Recent studies have shown that default rules can have a larger effect than significant economic incentives.
What we see or do first often determines outcomes. The order that items are displayed on a ballot or list will influence outcomes. If people are asked to add their signature at the beginning of a form, they are significantly more likely to be truthful. Information disclosure can also be an important tool of motivating behaviour, Cass tells us, but it needs to be simple and meaningful rather than complicated and hard.
At this point, Cass thanks Dean Minow, the Behavioural Economics and Public Policy, Facebook. Next, the panel gets onstage:
The Moderator is Yochai Benkler, Professor, Harvard Law School, and Faculty Co-Director, Berkman Center for Internet and Society. Yochai introduces three people doing work in the real world.
Eytan Bakshy is a Data Scientist at Facebook. He does a mixture of data and social science research. Facebook has 600 million users logging in each day. Eytan is especially interested in expanding our understanding of human behaviour by running randomised control trials. Eytan's work focuses on social influence and information diffusion.
Eytan starts out with a primer on social networks. Each of us has strong ties and weak ties (people who we interact with at different frequences). Eytan recently ran a randomised trial with 250 million users. He found that people with strong ties tend to share the same content, even if they don't see their friends sharing the same content. People's weak ties, those with whom they interact less frequently (less frequently in the same photos, sending each other messages) are sharing more diverse information. On social media, we tend to have large networks. The average person has one or two hundred friends, and college educated people might have around 600 friends. Each of us has a far greater number of weak ties, and those weak ties are very influential.
What are people sharing? Eytan talks about a study by Messing & Westwood which shows how people share news on Facebook. Next, he shows us a recent study on Facebook to see if people would be more likely to vote if prompted by their friends on Facebook. He then talks about a similar, yet-unpublished study he helped conduct on the 2012 election.
Update:Messing & Westwood found me and emailed me further detail. Their study "shows how social signals shape the way people consume content in online environments like Facebook." I can't wait to read it!
Facebook is a great place to learn about online behaviour, concludes Eytan. He thinks that the newsfeed is a great way to learn things and that Facebook can "nudge people towards more pro-social outcomes."
Next up is Sharad Goel, a computational social scientist at Microsoft. "I'm going to talk about what it means for something to Go Viral." How do ideas and products spread through society, asks Sharad. One theory is that these things spread through connected population like a biological contagion. Is that true? This question is hard to answer. Instead of individual data, we tend to get aggregated population data: line graphs of how popular something is. Those line charts can't answer the question.
Thanks to services like Facebook and Twitter, we can now investigate the spread of ideas at an individual level. He talks about two ways to measure this. In the first case, they're directly observing how things spread. In the second, they combine popularity data with information about the shape of the graph to _infer_ how something spreads.
So how does something actually spread? 93% of the time, URLs don't propagate. Even when friends do read or share what you shared, it usually stops there. (you can read his research in this blog post and this PDF on the Structure of Information Diffusion
Next Sharad asks us, "Did Gangnam style actually go viral?" What does it mean for something to go viral? Things shared from celebrities and broadcasters don't go viral- they're just broadcast to a large number of people. Something only goes viral, from a structural standpoint, if it spreads from person to person. He tells us about a study of a billion events, nearly every news story and video shared on Twitter for a period of time. He now shows us illustrations of three very popular events. In addition to examples of broadcast diffusion and viral examples, there are also examples in-between, a mix of individuals and broadcasters. What's interesting is that the timeseries attention plots for these very different patterns all look similar. Those plots can't tell us about the structure of the spread of information.
Sharad argues that we need to look at large numbers of events. Since virality is so rare (one in a million), sample size of at least a thousand events is necessary, and a sample size of a billion is much better.
In the press, we often differentiate between popular and viral. We wouldn't typically call the Superbowl viral. When something gets really large, it's almost certainly viral, because there aren't broadcasters large enough to share them that large. There exists however a middle ground where very large media events are not viral, but some might be. Except at extremes, size is not a strong indicator of virality.
Next up is Gilad Lotan, VP of Research and Development, Socialflow. Gilad starts by telling us a story. He recently moved to a new apartment in New York City, and he finally has space for a piano. He googled pianos, but it's incredibly hard to find a good piano. It didn't take long before the pianos started following him: he's now inundated with piano ads wherever he goes on the Internet. "Why do you have to keep following me, pianos?" Gilad asks. "I'm not ready yet."
(Gilad sets the table with issues raised by social media. We can now measure more than everything before. This shift toward a networked infrastructure is leading to a shift in power, as illustrated in the Arab Spring and many other events.)
Gilad shows us a chart of the use of the words power and superbowl earlier this year. It shows a shift from discussions of the superbowl to a huge discussion of the power outage. These kinds of attention shifts happen all the time. The big talk this week has been about how people use this data. Gilad talks about Oreo, who put out an ad about the power outage, based on their attentiveness to what people were talking about at the time.
Gilad shows us that different kinds of events often have very different kinds of curves: a social movement, a TV show, an awards ceremony. Just by looking at the shape, Gilad tells us it's possible to guess what kind of event it is. He then shows us a graph of people who follow Pepsi, the AP, and Al Jazeera English (many of these slides can be found in this slideshare). By measuring the shape of the event, it's possible to guess the effect of an event on an audience.
Next Gilad shows us his analysis of the KONY2012 event, which before Gangnam style was the most viral content ever seen. The big story in the media was that this video was put on Youtube, passed on through Facebook and Twitter and went Viral. When you look at the structure, you see very distinct structures. The clusters are segmented by geography: Oklahoma City, Pittsburgh, Birmingham Alabama. Many of their bios reference biblical Psalms, talk about God and Jesus. Many of them are students with religious affiliation, whose networks are set in place at the beginning of this viral event. The group Invisible Children had been working for years with schools and universities, developing ties. A message was sent out, asking everyone to share the video at the same time. They lit up the network, translocally across regions (read Gilad's post about this here).
Gilad next introduces about the idea of "audience volatility," the amount of focus throughout the network on any single event.
Yochai Benkler speaks next. The speakers, he tells us, illustrate some of the key practices in computational social science. What is the promise of computational social science? The first is the use of large-scale observational data to understand how people behave beyond their self interest. He talks about a study by Benjamin Mako Hill and Aaron Shaw on Wikipedia editing history. Someone conducted a study where they gave people stars, and they concluded that stars were effective. Looking at the whole data of the entire Wikipedia site, Mako and Shaw found that people responded differently to stars. Some people responded positively, and others received stars at the peak of their activity.
Benkler next talks about the problems of data cleanliness. One of the challenges is to calibrate between the online signals and what happens with "real world individual behaviour." Making that connection is incredibly important. We can do both field experiments and mechanical turk experiments, and we need to understand their different validities. Benkler talks about work by Mako and Andres Monroy Hernandez (which I wrote about here), looking at how young people were remixing and sharing creations with each other. In these studies, scale is important. If you have a one in a million event, you need a billion instances to learn about it.
Benkler next talks about concerns about acting "upon people" and predicting their behaviour. We are getting better at understanding how general mechanisms change from person to person. As we get better at this, we may eventually be able to create a personalised situation of behaviour, based on the millions of transactions we carry out. The better we get at letting governments and firms shape our behaviour, preferences, policies, principles, and beliefs-- we're shifting power toward whoever holds the data and knows how to connect it. That's a huge moral, normative, and legal challenge.
Many of the talks discussed social network and regional clustering. Do online social networks influence our in-person social networks? Eytan responds that we're seeing more and more long range ties on Facebook. At the same time, media like Twitter offer opportunities to form very weak, broad ties around interests. Gilad responds that SocialFlow often maps out the network of people who respond to an event. Sometimes, network clusters are grouped around location or interest, but those clusters always emerge. If users are younger, clusters tend to be geographic. Clusters of adults tend to be topical. Sharad talks about two possibilities: either we meet and discover people who are different from us, or we talk only to people who are like us.
An audience member asks, "what's the takeaway for marketers?" Does social media unlock a world of new possibilities? Sharad points out that there are two ways that people talk about being great on social media. In one model, people try to reach new audiences. In the other sense, people hope that they can spend a small amount of money to make something viral. That second hope is hype. Virality is very rare. Gilad talks about the difference between "earned" and "paid" audience. The earned audience is your existing constituents. You can use social media to get to know them better and make sure they're happy and continue to crow. By studying the "paid" audience, marketers try to learn about people who aren't yet your constituency and then decide where to put ads. Most marketers take a mixed approach, trying to grow their audience and trying to expand that audience.
Brian Keegan asks about the assumptions of homogeneity and heterogeneity around us. Should we be focusing on the long tail of people or the short tail of people who are very like each other? Sharad responds that this will definitely influence your statistical analysis. In many domains, statisticians aren't used to dealing with these heavy tailed distributions. An ordinary analysis would conclude that viral events, like black swans, don't exist. Gilad talks about the automated systems, like "top story" or "trending topic" algorithms. The decision of which algorithm we use is usually done by data scientists who are trying to add a feature to the product. We don't have as many processes set in place to figure out the implications of the types of content people do or do not see. For example, while Occupy Wall St trended around the world, it never trended in New York City, due to the nature of Twitter's algorithm. How can we optimise for an informed public rather than just traffic?
Eytan responds that most of his findings have heterogeneity of treatment effects. That's one reason that randomisation is good, making it possible to look at sub-populations and the influence of the treatment on those populations.
Yochai ends the session by proclaiming, "the individual resists aggregation, but resistance is futile."