More Data Than You'll Know What To Do With
Matt's a Research Assistant at the Center. He has spent his career at the intersection of technology and social change. He graduated with high honors from the University of Maryland College Park, where he wrote a thesis on the disruptive role of political blogs in journalism. He went on to join the strategy team at EchoDitto, a boutique consulting firm building cool technology for nonprofits, startups, and socially responsible businesses.
Then Matt attempted to save democracy by directing new media at Americans for Campaign Reform, a bi-partisan grassroots effort to enact voluntary public financing of federal campaigns. Right before Citizens United v. FEC hit, he joined the New Organizing Institute, where he helped to train the next generation of organizers. For most of this time, he also ran one of the most popular NetSquared groups in the world.
Matt's interested in pretty much everything, particularly the everything taking place at the Media Lab.
More Data Than You'll Know What To Do With
Updated with Data.gov and the MIT Libraries Guide to Data
I've truly drunk from the MIT firehose this week. They say it's not possible, but I think I actually managed to consume an unhealthy amount of information this week. Fortunately I had a strong Clover food truck coffee Friday morning, because the Introduction to Numeric Data Resources session at Harvard was an incredible introduction to the seemingly endless amount of data available on the internet. Fortunately, Data Reference Librarian Diane Sredi was there to inform us on what makes each collection interesting and/or useful.
She provided an overview of data resources and which places to start when looking. Common challenges include the fact that data's spread out everywhere and you can't look in just one place, but the flip side of that is that there's more data available than ever, and Harvard and other schools and libraries subscribe to a lot of these resources.
- Data Tutorials
- How to Start
- International Data Sources
- Freely Available Resources
- Massachusetts-Specific Data Resources
- MIT-Specific Data Resources
- Harvard-Specific Data Resources
- The Data Research Process
- Locating Data
- Understanding Data Files
- Working with Data Files
- Using Statistical Software Packages
Define your goals - do you just need summary statistics, or raw data to do analysis? What geographic region (world, national, etc.) and time period? Single point or longitudinal?
Who collected the data? A government agency may have tracked the data and published it in the form of a report.
Another way to find data is to do a literature search and find what data other people are citing.
Assess your Resources
Find out if they're reliable and apply to the correct population. Read any definitions they have and understand their sources.
Don't rely on search engines on data sites - look for a subject index instead.
US Census Bureau
Has demographic information as well as business and industry surveys, mapping information, geneaology. Click the subject list to see an index.
The annual Statistical Abstract in the National Data Book, published since 1878, is a great starting point that aggregates government data, except that the government has decided to stop funding it. They have annual PDFs of the Abstract going back to 1878.
If you open government data Excel files, you'll be able to see a link to the source of the data.
They also list state data here, as well.
American Fact Finder
People are constantly changing data interfaces to make them better; sometimes they succeed, sometimes they don't.
This site has economic surveys, population data (including a population clock counting the number of US citizens, currently at over 312,000,000).
Wide range of topics linked to agencies that cover that subject area and a drop down on the right of Agencies by subject with summaries of what the agency covers and links to contact information, which is helpful because the Agencies are generally very willing to help you.
My friend Carroll pointed out that Data.gov and its 390,000 datasets didn't make the list, somehow. From their homepage:
- 390,136 raw and geospatial datasets
- 1,119 government apps
- 236 citizen-developed apps
- 85 mobile apps
- 172 agencies and subagencies
- Suggest a dataset or app!
- 2011 Next Generation Data.gov is interactive, explorable, and social.
The United Nations
Wide range of topics in the 'Databases' list on the left, listed by topic area so you can see, for example, the various databases available for global health stats.
You can filter by country and year and export to Excel and CSV.
The great thing about the UN interface is that you can always see the source and contact information at the bottom of each page.
It's also really nice to be able to see the specific countries' data services (on the right on the homepage).
See also the Monthly Statistics Bulletin.
The Lamont Library at Harvard also has access to a lot of foreign statistic services.
The World Bank is a great example of a previously paid resource that is now free. It's arranged by topic, country, indicator, and data catalog. There's also a new link to Microdata, which lists surveys from different countries. In the Data Catalog, World Development Indicators is their main database covering over 200 countries. You can look either alphabetically or in groupings, like region and income level.
They have over 1200 variables covering a wide range of topics from education to environment to poverty and the public sector.
Some of the information goes back to 1960, but not for all of the variables or all of the countries.
You can export, view, or format the layout of the report to your liking. Click the little 'i' icon to get a quick look at what the source includes without having to download the complete dataset.
They have lots of economic and business data, a Knowledge Economy Index, and lots more. If you're doing any sort of international research you should start at the World Bank.
(These are from Matt Carroll of the Boston Globe, who spoke in my Systems Visualization class Friday afternoon)
Boston Globe's Government Center
City of Boston's Data Dashboard - "Surprisingly useful"
This Data Management and Publishing Guide is a practical self-help guide to the management and curation of research data throughout its life cycle. It provides guidance on a range of topics, including: planning for data management, documentation/metadata, file formats, data organization, data security and backup, citing data, data integration, funder requirements, ethical and legal issues, and sharing and archiving data.
Social Science Data Services - great for "finding, understanding, and managing statistics or numeric or tabular data in the social sciences, and management."
MIT affiliates also have access to many of the resources here:
Start at the library portal, click Find E-Resources in the Articles & More Section. It defaults to the Title page, but if you're not sure what you're looking for, click the Keyword or Subject tabs. Under the Subject tab there's a list of categories, because librarians love to categorize things. If you click a topic you're interested in you'll see a breakout of the resources listed by type. You'lre looking for Statistics and Data, Indexes to journal articles, and Research guides.
Click on one and click Go. It's not an exhaustive list, but a great place to start. Click on the 'i' icon for an expanded description including the dates it covers, how often it's updated, and any access restrictions.
Economist Intelligence Unit
A great place to start for economic data about a country. Type in your country or search by specific reports. If you just want data, click the link for Data Tool underneath the Reports dropdown on the right of the homepage.
Includes CityData, with varying but interesting data from cities around the world.
CountryData is the broadbased option and covers a lot of economic information (key indicators, exchange rates, etc.) back to 1980 and even forecasts up to 2030.
ProQuest Statistical Insight
AKA Lexis Nexis Statistical
Has US and international information You can search for something broad like 'employment' and then use the interface on the left to drill down by source, file format, region, date published, date covered, and targeted sbuject area (like Women's Employment).
Their tool lets you forecast up to 2089, with a significant number of results projecting that far into the future. How accurate these end up being is up for debate. More resources are starting to use drilldown tree graph interfaces to help you begin to target what you're looking for.
Access books, papers, and a dedicated Statistics section. Lots of energy and economic databases, etc. You can search by theme. OECD.Stat tool allows you to compare data across multiple datasets. Their Country tables include key statistics. You can do pivot tables on this data. Click the 'i' button as usual for some nice metadata.
Great archive housed at Michigan since 1962. One of the oldest and largest social science archives in the world. Government surveys, researchers' data, polls, panel studies. Broad range of topics from anthropolopogy, communications, health, and political science.
Click the Find & Analyze Data tab. If you know what you want, the Search is great. If you'd like to browse, you can search for keywords or scroll down to the Browse By Topic section. You can also browse by geography, either by world map or a country index with the number of studies for each nation listed.
The ICPSR Thesaurus is a nice tool if you aren't getting the results you want with the search terms you're using. Try their terms listed here and you'll get much better results, and be able to control for how broad or narrow you'd like to go.
Bibliography of Data Related Literature is nice if you have no idea what you're looking for - come browse the articles here and they'll give you a sense of what to look for.
They also have info on using the data for with data.
Largest collection of social science research data, housed at Harvard. You and other researchers can create their own dataverse, backed up by Harvard but owned by you. If you want access to someone's dataverse and they're not allowing it, it's sometimes helpful to get in touch with that researcher.
You can filter dataverses by type of organization, like large institution or educational institution. If you're not on campus, go to the Login screen, leave the username and password blank, and select the university affiliate you're with (MIT).
We look at an example dataverse. "Unit of information: children." Creepy.
Dataverses include documentation. You can run data analysis and download as text, R data, S plus, and Stata. You can recode and case-subset right on the website as well as run advanced statistical models (Diane recommends you understand the model before analyzing it).
Unfortunately there's no one-stop shopping, but clearly lots of good places to go look.
Public Opinion Data
The Roper Center for Public Opinion Research
Half a million public opinioni survey questions and has international data but heavily focused on US. You can search by keyword and country but also specific polling sources. The search engine will give you nice graphic summary statistics - if there's an 'X' icon you can download the data. You need to register, which is free if you go through Harvard libraries. They have specific Latin American and Japanese opinion archives (more on Japanese data).
Government Documents Deparment in Lamont Library at Harvard
Lots of US and international census data and other statistical publications. Foreign document specialist is very open to purchasing what you need for your research.