Creating Technology for Social Change

The Ukraine-Latin America Connection: Clustering Countries by Video Trends

Why would some online videos trend in both Ukraine and Latin America? I don’t know, but it looks like they do. Continuing our work on the What We Watch project, which Ethan previously blogged about, I noticed that Ukraine has a surprising (to me) number of trending videos in common with countries like Mexico and Argentina. Here’s one example:

I came across videos like this one while digging deeper into the YouTube Trends data we’ve collected, which contains the top 10 trending videos for each day of the past 6 months, for about 60 countries. Specifically, I’ve been analyzing clusters of countries that often share trending videos. To find these clusters I’ve been using a technique called topic modeling, which was designed to find groups of words that frequently occur together in texts. Instead of groups of words, I’m looking for groups of countries. For the tech geeks out there, our analysis code is built on top of an LDA implementation I wrote as a final project for MIT’s graduate machine learning course.

Some of the clusters we’ve found are to be expected, like one clearly language-based group of 8 Arab countries. Three of those (Morocco, Algeria, and Tunisia) also appear in another cluster with France and a few other European countries, which might not come as a surprise if you’re aware of France’s colonial history in North Africa. One advantage of LDA-based clustering is that it allows countries to appear in multiple clusters, so we’d hoped to find these types of connections.

Other clusters reveal some unusual connections. The cluster mentioned at the top of this post contains entirely Latin American countries, Spain, and… Ukraine. Among the top trending videos shared by Mexico and Ukraine, several have a strong presence in Latin America. Interestingly enough, those videos are not in Spanish, but a Slavic language (not speaking Russian or Ukrainian, I can’t say whether it’s one or the other, or a combination). This pattern stands in contrast to, for example, the trending videos shared by Ukraine and France, which don’t concentrate in any particular set of countries, and mostly use English as a bridge language.

So why might these videos find a shared audience between Ukraine and Latin America? One guess might be migration. Wikipedia suggests that Argentina has been a popular destination for Ukrainian emigrants. But the connection could also arise from any number of other factors. I’d be interested to hear alternative suggestions, or thoughts on why these particular videos form a cluster.

Our LDA-based approach is approximate and subject to some fine-tuning, so as we collect more data and apply other methods, we’ll be keeping our eye on how these clusters hold up.  Where this approach really shines is in capturing some of the nuance of international cultural flows by placing countries in multiple clusters. It’s particularly exciting to me that, even without speaking Spanish, Russian, or Ukrainian, someone like myself can use this method to find an unexpected connection, and use that as a jumping-off point for further research.