Helping researchers and scholars access and analyze petabytes of historical internet content.
The Archives Unleashed Cloud is an open source cloud-based analysis tool that helps researchers and scholars conduct web archive analysis.
The Cloud is a component of the Archives Unleashed Project, which aims to allow researchers, scholars, librarians and archivists the ability to access and investigate their web archival collections. As accessibility is a main priority of the project, the Cloud supports this goal by providing a web-based front end for users to access the Archives Unleashed Toolkit.
What is Web Archiving, what are WARCS, and why should I care?
Web archiving is the process of preserving born-digital content on the World Wide Web. Thinking about how curated content is organized, there are two main components:
- WARCs (Web ARChive): the ISO Standard (28500:2017) file format that will house one or more WARC records.
- WARC record: contains all of the data of a webpage and contains a record header and the record content block.
The exponential rise of digitally-born data offers a unique opportunity for scholarly inquiry. However the sheer scale of web archives can be an overwhelming process for librarians, archivists, and other curators who face several challenges when working with this material. One of the largest challenges relates to accessibility. This includes a shortage of tools or tools that are largely beyond the skill base of researchers.
The Archives Unleashed Project was born out of the need to create accessible and user-friendly tools in order to work with web archives. As a web-based interface, the Archives Unleashed Cloud allows researchers an opportunity to handle and analyze web archives without having to spend time delving into the technical world.
How can I use the Cloud?
Who can use the Cloud?
Currently, those who maintain an Archive-It account will be able to ingest, download, and analyze their web archives collections in the Cloud. The Archives Unleashed team is looking at ways to expand future functionality of Cloud users.
What can I do in the Cloud?
Our goal is to get users to a point where they can really dig down into WARC files to explore their research questions. The core features of the Cloud include:
- Ingesting Archive-It collections via WASAPI.
- Download collection derivatives: network files, domain lists, full text, and full text by domain.
- In-browser network diagram to see major nodes and connections within your collection.
Who Created the Archives Unleashed Cloud?
A digital librarian, historian, and computer science professor met for coffee....well there is a little more to that.
We are very lucky to have some very talented and passionate individuals to usher the Archives Unleashed Project and Cloud into existence.
- Nick Ruest, Co-Investigator and Cloud Lead
- Ian Milligan, Primary Investigator and Cloud Developer
- Jimmy Lin, Co-Investigator and Toolkit Lead
- Samantha Fritz, Project Manager
- Ryan Deschamps Cloud Tester and GraphPass Developer
A special thank you to Sarah McTavish, a PhD Candidate at the University of Waterloo, who developed the Archives Unleashed Cloud Learning Guides.
We would also like to acknowledge the groups and individuals who have inspired and informed our work:
- Web Archives for Longitudinal Knowledge (WALK) Portal
- Project Blacklight
- Documenting the Now
- Internet Archive
How much does the Cloud cost to use?
The Archives Unleashed Cloud is currently free. Yep, you heard right! The generous support from the Andrew W. Mellon Foundation and contributions from our team, developers, and community have made this possible. We will keep our users and community updated as we move towards a sustainable future.
Who funds the Archives Unleashed Cloud?
Other financial and in-kind support comes from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, and Start Smart Labs
How can I contact the Archives Unleashed team?
Join our Slack team if you want to see how things are developing, to discuss suggestions or other parts of our project, or to just shoot the breeze about all things web archiving. You can also follow us on Twitter!
Alternatively, please just drop us a line via e-mail at email@example.com.