Historical Page –FAQs | Penn Program in Environmental Humanities

-HISTORICAL PAGE-

FAQS

HOW CAN WE KNOW WHICH DATA TO TARGET SO WE DON'T REPLICATE THE WORK AT OTHER DATA RESCUE EVENTS?

We are currently gathering information from experts, scientists, and community members about particularly valuable and vulnerable data. Fill out our form to help us! For those events that would like help identifying areas to focus on, we aim to provide lists of the most important datasets and sources that we've identified so that each event can tackle a piece of the larger set without too much duplication. However, understanding the data that is most valuable and vulnerable within your own community can be a really important aspect of your Data Rescue event.

CAN YOU USE MY STORAGE SPACE?

We truly appreciate all offers of storage space that we've received, however we have storage space covered - see below!

WHERE WILL DOWNLOADED DATA GO?

If your institution can’t host the data your Data Rescue event downloads, we have developed a DataRefuge repository using Amazon Web Services integrated with CKAN - an open source data catalog - that will be available to DataRescue events for storing and making accessible copies of data.

WILL YOU BE PROVIDING BEST PRACTICES FOR CREATING RELIABLE COPIES?

Yes! And we welcome your collaboration. Generally, we are recommending that those materials that can be captured through webcrawling and the activities of End of Term Harvest should be captured in that way. We will rely, in part, on the toolkit developed after the event at University of Toronto, as well as locally developed code to seed the harvester.
For those data that don’t make sense in the Internet Archive, we’ll be adding them to an open data catalog, mentioned above. We've developed workflow for this to occur at DataRescue events and are working on establishing a remote workflow for those unable to attend events to help out.

HOW DO I GET ACCESS TO THE SPREADSHEETS MENTIONED IN THE WORKFLOW?

We recently did away with the spreadsheet of "uncrawlable" data and an app is now used to streamline the workflow and supersede the spreadsheets. Contact us at datarefuge@ppehlab.org to find out about getting access to the app for your event.

DOES THIS EVENT NEED TO HAPPEN BEFORE THE INAUGURATION? THAT IS, ARE WE WORRIED ABOUT THIS INFORMATION GOING OFFLINE IMMEDIATELY?

No! Our understanding based with conversations with various experts, and the experience of End of Term Harvest project, which has been active through several changes in administration, is that things won't change immediately. The task at hand is a large one, and there will absolutely still be important work to be done through February.

CAN OUR EVENT BE ON THE SAME DAY AS ANOTHER EVENT?

Yes! There's really no reason two (or more!) DataRescue events can't be happening on the same day. If a city very close to you is having an event on the same day, you may want to consider coordinating your efforts but you know best how likely your community is to travel for an event like this. The only other thing to consider is what agency or datasets your event will focus on. To lessen the likelihood that your efforts will crash a server, same-day events should choose distinct agencies and know what the other is working on. We can help you coordinate this!

CAN I DOWNLOAD DATA FOR DATAREFUGE WITHOUT ATTENDING AN EVENT?

Soon! We're developing a workflow to allow people to work independently while maintaining the quality assurances that exist for our DataRescue Event workflow. We're very close to completing this workflow but it is a little complicated. Check back for details!

If you want to start downloading immediately, we recommend getting in touch with Climate Mirror. Climate Mirror's copies will not have the same documented chain of custody that DataRefuge copies will have.

WHY ARE YOU FOCUSING ON CLIMATE AND ENVIRONMENTAL DATA WHEN OTHER DATA IS ALSO INCREASING AT RISK?

DataRefuge as an idea comes from the Penn Program for Environmental Humanities; most of our own research relates to climate and the environment, so our first concerns were naturally on these areas. We also felt that given the climate change-denial of the in-coming administration, that these data were particularly vulnerable. We care deeply about the other data that have started to be removed or targeted recently but even just environmental and climate data is a huge undertaking. We're working with the Association of Research Libraries to work on a collaboration to ensure the safety of other data for the long term - see http://www.librariesnetwork.org/

DOESN'T INTERNET ARCHIVE ALREADY CAPTURE THIS DATA?

No - not all information on the internet is captured by the Internet Archive and not all information *can* be captured by the Internet Archive. See EDGI's How a Webcrawler Works or our poster, What is Web-Crawlable? for more details.

DOESN'T DATA.GOV ALREADY CAPTURE THIS DATA?

No - data.gov is an excellent catalog of government data, but it is not comprehensive. Data.gov also is catalog, rather than a back up in most cases, and it is not immune to being shut down or made unavailable. For these reasons, it's still useful to be creating more copies of federal datasets, even if they're also represented in data.gov.

HOW CAN I GET IN TOUCH WITH YOU?

Email us at datarefuge@ppehlab.org, find us on Twitter, or use the links below to contact us through the PPEH Lab social media accounts.