Data Rescue Philly Builds Data Refuge
January 15, 2017
Over the course of the two-day DataRescue Philly event, 250+ people attended. We are very grateful for so many motivated, determined, and--above all--generous volunteers and collaborators. Thanks to you all.
The seeders and sorters (explained below)--led by Data Refuge Guides Maya Anjur-Dietrich, Andrew Bergman, and Toly Rinberg--got through 3,692 NOAA websites on Saturday. The "baggers," led by Justin Schell, captured a lot of NOAA data--in the words tweeted out by event participant Robert Cheetham (CEO Azavea), "jillions of bytes of data bagged and tagged today." Or in the words of Data Refuge's Co-ordinator, Laurie Allen, Assistant Director for Digital Scholarship at Penn Libraries:
The folks who were downloading got 17 bags (bags = all of the various files made available through a page that a web harvester can’t access – they are often really hard to get). Of those 17, the first 8 are up indatarefuge.org with light metadata. The next 9 will be up in the next couple of days. Those 17 bags combined are about 24 gigs, and another person got ~1.5 terabytes on her own (she’s very awesome). That one will need some special attention.
A diverse group of participants from various backgrounds and with different skills came together over the course of two-days to contribute to the project. We had a full house for the kick-off Teach-in, double the number of Guides we expected (we ran out of our Guides tee-shirts!), and the panel discussion to close day one drew a storm of questions. For six portraits of participants on day 2, check these blog posts (Part1 and Part2) by Program Fellow Kaushik Ramu with photography by Faculty Working Group Member Naomi Waltham-Smith, Guide for the documentation group. Program Coordinator Patricia Kim aggregated the many tweets posted throughout the events in a series of four "Field Notes" (1, 2, 3, 4 on the Fellows Blog).
For the second day of DataRescue Philly, we asked participants to choose one of six paths into the Refuge, each led by one or more Guides trained on the first day. (We trained almost 50 Guides!) These paths were:
- the Seeders/Sorters: who nominated urls to seed the End of Term Harvest project so that these sites would be machine crawled and put into the Internet Archive AND who sorted out the data (pages, datasets, query tools, etc etc) which can't be machine crawled and must be captured by other means.
- the "Baggers:" who captured data that couldn't go easily or at all into IA, figured out a work flow (check it out here), devised ways to get these ornery and often very large materials on a case by case basis. Once they had it, they "bagged" it so it could then be moved into our CKAN instance, the data refuge. Check out the datasets--from NOAA, the Department of Energy, EPA--already in the refuge.
- the "Tool Builders:" who helped the baggers with especially tricky captures
- the "Metadata" team of archivists and librarians: who worked with the baggers and tool builders to describe those data sets
- the Documentation and Storytellers: who (as quoted above) wrote and visually documented stories from the Data Refuge. Several longer-term individual and collaborative storytelling projects were devised, including one with former White House Presidential Fellow Denice Ross, a leader in the open data initiative. It will cast storylines between data and the many and diverse people and communities who use them.
- the Long Trail: who thought together, also in conversation with the other groups, about how to grow data refuge after the "rescue" events: regular meet-ups to continue the work of seeding and sorting into the spring, projects with our collaborators in EDGI (Environmental Data Governance Initiative) to track changes in the websites of several federal agencies and prepare 100-day reports.
NOAA was indeed the focus, especially of the Seeders and Sorters, and we tackled it using the results of the survey we circulated via the Union of Concerned Scientists and by using the agency primers and sub primers developed by our project partners from EDGI: Rinberg, Bergman, and Anjur-Dietrich (Guides of the Seeders and Sorters). We are eager to carry on this work with the team who worked so quickly
From Toronto, we hosted Michelle Murphy, lead organizer of the "guerrilla archiving" event there in mid-December 2016, and a co-founder of EDGI. She spoke on the Data Value and Vulnerability roundtable on day one. The panel also included Jefferson Bailey (Director, Internet Archive), Robert Cheetham (President/CEO, Azavea), Michael Halpern (Deputy Director, Union of Concerned Scientists), and Sarah Wu (Deputy Director for Planning, Office of Sustainability, City of Philadelphia). The event was video recorded and will be publicly distributed once it has been lightly edited with some titling.
We were very happy to host organizers or others assisting with several upcoming #datarescue events: Rebecca Lave will help out with #DataRescueIndy organized by Jason Kelly; Mike Hucka who will help out with Los Angeles), Jerome Whitington (New York), and others.
January 17, 2017 Chicago: #DataRescueChicago
January 19, 2017 Indianapolis: #DataRescueIndy
January 20, 2017 Los Angeles: #ProtectClimateData
Keep checking back for more information on other #DataRescue events and about how to expand Data Refuge.
This coming Tuesday morning (January 17), at the weekly PPEH Fellows colloquium, a group will re-convene in Penn Libraries to continue on the Long Trail in consultation with our many partners and collaborators.