Helping researchers and scholars access and analyze petabytes of historical internet content.

Project Overview

The Archives Unleashed Cloud is an open source cloud-based analysis tool that helps researchers and scholars conduct web archive analysis.

The Cloud is part of the Archives Unleashed Project, which allows researchers, scholars, librarians and archivists the ability to access and investigate their web archives. The Cloud supports this goal by providing a web-based front end for users to access the most-recent version of the Archives Unleashed Toolkit.

While we are exploring integration with other services in the future, the Cloud currently requires Archive-It credentials to use.

What is Web Archiving, what are WARCS, and why should I care?

Web archiving is the process of preserving born-digital content on the World Wide Web. Curated components are organized in two main ways:

  • WARCs (Web ARChive): the ISO Standard (28500:2017) file format that will house one or more WARC records.
  • WARC record: contains all of the data of a webpage and contains a record header and the record content block.

The exponential rise of digitally-born data offers a unique opportunity for scholarly inquiry. However, the sheer scale of web archives can be an overwhelming process for librarians, archivists, and other curators who face several challenges when working with this material. One of the largest challenges relates to accessibility. This includes a shortage of tools or tools that are largely beyond the skill base of researchers.

The Archives Unleashed Project was born out of the need to create accessible and user-friendly tools to work with web archives. As a web-based interface, the Archives Unleashed Cloud allows researchers an opportunity to handle and analyze web archives without having to spend time delving into the technical world.

How can I use the Cloud?

Start by reading our detailed documentation! For any technical support questions, please join our Slack channel and add the #auk-support channel.

Who can use the Cloud?

Currently, those who maintain an Archive-It account will be able to ingest, download, and analyze their web archives collections in the Cloud. As noted above, we are exploring further service integrations.

What can I do in the Cloud?

Our goal is to get users to a point where they can really dig down into WARC files to explore their research questions. The core features of the Cloud include:

  • Ingesting Archive-It collections via WASAPI.
  • Download collection derivatives: network files, domain lists, full text, and full text by domain.
  • In-browser network diagram to see major nodes and connections within your collection.

Who Created the Archives Unleashed Cloud?

A digital librarian, historian, and computer science professor met for coffee....well there is a little more to that.

We are very lucky to have some very talented and passionate individuals to usher the Archives Unleashed Project and Cloud into existence.

We would like to acknowledge our research assistants from the University of Waterloo. Sarah McTavish developed the Archives Unleashed Cloud Learning Guides and Rebecca MacAlpine has provided quality assurance support as well as pedagogical development.

Finally, sincere thanks as well to Ryan Deschamps, who developed GraphPass and who served as a postdoctoral fellow on the project between 2017 and 2019.

We would also like to acknowledge the groups and individuals who have inspired and informed our work:

How much does the Cloud cost to use?

The Archives Unleashed Cloud is currently free. Yep, you heard right! The generous support from the Andrew W. Mellon Foundation and contributions from our team, developers, and community have made this possible. We will keep our users and community updated as we move towards a sustainable future.

Who funds the Archives Unleashed Cloud?

This work is primarily supported by the Andrew W. Mellon Foundation, the University of Waterloo, and York University Libraries.

Other financial and in-kind support comes from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, and Start Smart Labs

How can I contact the Archives Unleashed team?

Join our Slack team if you want to see how things are developing, to discuss suggestions or other parts of our project, or to just shoot the breeze about all things web archiving. You can also follow us on Twitter!

Alternatively, please just drop us a line via e-mail at sam.fritz@archivesunleashed.org.