Creating Technology for Social Change

Bringing a Nation’s Archives Online

(a Civic lunch liveblogged with Nathan Matias and Rahul Bhargava)

Today, we’re hearing from the National Archives and Records Adminisration about the archives they maintain, how they’re making those archives available online at, and approaches to sharing the archives to broader audiences.

Pamela Wright is the Chief Innovation Officer at the National Archives and Records Administration. Bill Mayer is the Executive for Research Services at NARA. Michael Moore is the Access Coordinator for Research Services East (right here in Waltham, MA).

About the US National Archive
The US National Archives keeps over 4.5 million cubic feet of traditional records: paper records, audio, video, maps, and over 500 terabytes of electronic data. Their records span US Federal agencies, courts, records from Congress, and 13 presidential administrations. Only about 5% of Federal government records are preserved in the National Archives; the sheer volume requires them to be selective. There’s a lag of at least 15 years before the records are sent over to the Archives from the agencies that created them.

The archives cover any issue the federal agency touches, from the environment to housing to health. There are studies, reports, case files, hearings files, aerial photos, and more.

Environmental scientists from Maine have studied fishing records to learn about changes in the region’s stock over the years. The archives keep project files for the US space program, traffic data from America’s cities, and much more. Now that the US space shuttle program has been shut down, records are already being “palletized” for archival.

How Can You Find Records from NARA? In addition to checking the Archive website, researchers can talk to archive curators directly. To help, NARA also offers residential research fellowships for anyone who wants to explore the archives more closely.

The Archive would like to expand access to their records, but archives are often held hostage by their format, their location, and the challenge of indexing, says Bill Mayer. There are 15 locations nationwide, but ideally, researchers have the same experience accessing records no matter where in the US they are located. So far, this remains an ideal.

Setting free these records is at the heart of the Archive’s mission. It is an awesome responsibility that supports the democratic process in this country by allowing citizens to hold the government accountable for its decisions. Archives can be personal. Bill tells us of the time he met a Vietnam vet who spent 43 years gathering the emotional courage to visit the Archives and look up his unit commander. It took the curator 30 seconds to find the commander’s name. These records change peoples’ lives.

How do we set the records free? Records on paper fill miles of storage space. The Archives’ current footprint consists of miles and miles of shelving, “from the limestone caves of Kansas to compact shelves in downtown DC.”

The records come in from agencies with varying degrees of metadata. Bill gives us “a snapshot of the pain” of the archives, showing us the process of obtaining, processing, and sharing data to the public. For example, the Archive also deals with Freedom Of Information Act requests. They have sealed records that are subject to FOIA requests, and if the request is approved, they must find the record, redact it, and share it.

Pamela Wright, Chief Innovation Officer for the Archives, tells us that the archives just set up their office innovation in October. It covers social media, the web, the online public catalog, the standards program, and presidential libraries. They’re also responsible for coordinating NARA’s Open Government program and Digital Government strategy. Pamela comes to the Archives from a research career focusing on water issues for a Native American tribe in Montana.

Pamela tells us about recent culture changes in the archives. They started to dabble with social media for the first time in 2009. At the time, the organisation faced the fear coming from a desire to stay in control. People in the organisation weren’t interested because they thought that social media wouldn’t serve the mission. NARA tried short, three-month pilot projects, which convinced people.

A new Archivist, David Ferriero, came on in 2009 with a new energy and created a culture where employees could assume the answer to new ideas was “yes until no”, drawing inspiration from Joshua Greenberg’s work at the New York Public Library. The social media team adopted this motto.

The Obama Administration initiated an Open Government Directive, in response to which Federal agencies were required to develop plans. President Obama established the expectation that the government doesn’t have all the answers, and needs to be more open. Some staff were thrilled by this shift, while others were upset.

The Archive launched a document-of-the-day series on Tumblr, with wild success. But the relative scale of the following there wasn’t enough for the National Archives. Recognizing the role of Wikipedia in Americans’ research habits, they hired a Wikipedian in residence. In just the month of April, 2012, National Archive articles received hundreds of millions of views. The Archive has also uploaded public domain images and started a Wikipedia document transcript project. Pamela credits the strategy of ‘skating to the puck’ online with helping them reach many more Americans.

After these successful experiments, the team developed a coherent social media strategy to support the Open Government Initiative. They shifted away from a broadcast model.

In 2012, over 135 external projects were published on 13 platforms, generating tens of millions of views. Pamela’s big, hairy, audacious goal is to get all of NARA’s records online. They have 30,000 linear feet of records a year.

Government organisations often pressure their employees to speak with only one voice. There’s a fear of staff or the public saying something wrong and hurting the brand. Pamela says that single voices strangle and paralyze institutions, preventing them from having an authentic conversation with the public.

Pamela tells us about the Citizen Archivist Dashboard, a platform that enables users to tag, transcribe, and edit online records, adding their own uploads and sharing them with their friends. They’ve treated these as pilots, to see what gets traction with the public (and fly under management’s radar). They list some of these projects on, government public engagement platform. One real-life outreach effort was the “History Happens Here” initiative, which challenges people to find a catalog picture and take a current photo of that in the same location today. It was inspired by the Museum of London’s augmented reality historical photo app.The top twenty photos were choosen for a postcard book. The transcription pilot was a succes for them, with the 1000 trial documents getting transcribed in just 2 weeks.

Internally, NARA has been using the Jive platform to encourage more horizontal, social information sharing among people working at different archives across the country.

They want to engage software developers, in additon to the historians, archivists, librarians, etc, by making more datasets available.

Electronic records are the next great challenge for NARA. Agencies are under a directive from Obama that by 2019 they will manage all of their electronic records electronically. “That seems so delightfully rational.” But agencies have legacy, rigid, monolithic technologies stuck in contracts. Once the data comes in, NARA needs to find ways to keep it safe and provide access. Things have changed from 2007, when George W Bush tried to claim that presidential email didn’t count as records. Now that attitudes have changed, NARA faces a huge challenge to process and make that data avilable.

What kind of data does NARA currently have? Bill’s work includes classified, private data. Pamela focuses on much more open information. Structured data is often tough to process, and NARA will often take in flat files that can be redacted before sharing. In fifty years from now, how will it be possible to share data? NARA also needs to navigate its relationship with other agencies that are opening up their data more directly and engaging the public with hackathons.

Pamela tells us that constant access and interest is one of the best ways to ensure that something is preserved. That’s why NARA is trying to find interfaces to share that data with the public.

Chris asks about the challenge of digitization. Pamela explains that NARA has an internal set of labs that work on this, and they have external partners (like Chris wonders how the transcription pilot fit into this. Pamels explains they have millions of documents in their catalog digitized. Some clearly important ones are done manually, others are just digitized en-masse.

An audience member asks about controversial material (like Wikileaks). Its not easy, Bill says. The mission is access, but there are laws that govern how an agency releases that material. They work with the general counsel and the national declassification center.

Following 9/11, the federal government became concerned about “records of concern”, or potentially damaging information that had been made public. The result was that a number of records groups were shut down, including Vietnam War records that were previously available.Bill expressed frustration with a recent Fresh Air piece that wasn’t able to go into details about why that happened and how they might use a FOIA request to access those records now.

An audience member asks about which open datasets have been flagged as “interesting”. Pamela says the open ones have been added to the catalog (for download). They have 85% or all the records they hold described in the catalog. The elecronic records operate on more of a pipeline – NARA takes them in as they are being created, but the agency still owns the access (legally NARA can’t legally provide access yet). This is one of the more complicated issues they face. Matt adds that you can create a FOIA request asking what other FOIA requests are being made – so that can be a source for what might be interesting.

George W. Bush emails will be available for FOIA in Jan 2014 – NARA needs to be ready for this.

Andrew mentions some examples with Flickr, and asks where NARA draws inspiration from. Parmela mentions some neat NASA + Angry Birds and Smithsonian 3d printing examples.

Sites and datasources managed by the National Archives: