Frequently Asked Questions
Have questions about the Archives Unleashed Cloud? Check out some of our most frequently asked questions.
Is your question not listed here or would you prefer to talk to a human? Then join our Slack channel and add the #auk-support channel!
How do I use the Archives Unleashed Cloud?
If you're new to the Cloud, our documentation page page gives a full walkthrough on how to set up and use your account and dashboard.
I don't have an Archive-It account! Can I still use the Archives Unleashed Cloud?
We currently only support Archive-It accounts, as our system is based on the Web Archiving Systems API, or WASAPI. In the future, we are hoping to expand the services that we support. If you have a web archiving service and would like to integrate with the Cloud, drop us a line.
Can I can sync my Archive-It account with the Archives Unleashed Cloud without using Twitter or Github?
Right now authentication is only offered via Twitter and GitHub - we hope to expand this in the future. We've had some of our colleagues come to the Cloud without an account for either Twitter or GitHub, and if you're in a similar situation we recommend creating a research-based account in order to use the Cloud. Don't worry – you don't need to code or tweet!
Why are some collections listed as public, and others are not?
There are a variety of metadata fields that are imported from the Archive-It system that describe collections you can work with in the Cloud. One of those fields indicates whether a collection is public or not, which is determined by users within the Archive-It system.
Why is my collections dashboard empty?
You haven't entered your Archive-It credentials and updated your account. This will allow the Cloud to import your collections from Archive-It, and once that process is complete you will see the dashboard populated with some basic information about all of your collections.
I've asked the system to process my collection, but the screen is still empty. How long does it take to analyze the collection?
Our Cloud works on a queue system, which means that as soon as you click "Analyze Collection," it has entered into a queue of all the jobs users are running. We operate on a first-in-first-out system, so somebody may have started processing their collection before you. Only when the analysis is done will you see derivatives.
In terms of time, this estimate is based on a few factors, such as how many other users are in the queue and the size of your collection. You only need to click the “Analyze” button once. Don't worry: once your collection is done analyzing, we will send you an e-mail.
Can I bring my own WARC files to the Cloud?
Right now the Archives Unleashed Cloud is only able to handle data by way of Archive-It subscriptions. As noted above, we are looking at ways of expanding this in the future.
You are always welcome to use the Archives Unleashed Toolkit to analyze your own WARC files, though this will require a bit of knowledge and patience with the command line. We have documentation and walkthroughs on how to use the Toolkit locally.
What does the hyperlink diagram represent?
The interactive hyperlink diagram was designed to help users see their web archive collections at a glance, and visualizes the connections between the domains present within the collection.
The diagram is based on social network theory, which visualizes the communication or connection relationships between entities (individuals, groups, organizations, institutions).
I have the derivatives, but now what do I do?
Now comes the fun part of digging into your web archival collections. The derivatives give you a starting point for further analysis on your research questions. The way you approach analyzing the derivatives will largely depend on the content of the collections, topic, and what some of your research questions are.
First, we have prototype Jupyter Notebooks available for you to work with the derivatives generated by the Archives Unleashed Cloud. You can read more about this in our blog post "Exploring Web Archival Data through Archives Unleashed Cloud Jupyter Notebooks". They allow you to interactively explore and filter the domain count information, extracted full text, and network visualization data generated by the Archives Unleashed Cloud.
We are currently exploring greater integration between the notebooks and the Archives Unleashed Cloud. To use them now, please visit the GitHub repository here and follow the instructions.
Secondly, here are a few more ideas and resources to help you get started:
- Cloud Learning Guides: thanks in part to Sarah McTavish and Ian Milligan, there are a number of guides that walk through using each derivative file with additional tools.
- Cloud Hackathon team projects: at each #hackarchives event, teams have the opportunity to work with the Archives Unleashed Toolkit and other resources to uncover the hidden gems in their collections. Have a look at some of the final projects to see what types of visualizations and processes were used for analysis (Toronto, Vancouver).
- External resources: there are so many other tools and resources available for users to do further analysis with their Cloud derivatives. Here are a few suggestions our team, as well as #hackarchives participants, have used in the past.
|Type||AUK Derivative||File Type||Additional Tools|
Full text (all)
Text by Domain
I ran analysis on my collection but the text file is empty. What happened?
I'm seeing some weird error messages: 404, 422, or 500, what do they mean?
If you see an error message come up, something's gone a little wonky.
- A 404 message will appear when the server cannot connect to the page a user has requested. Usually this is because of a broken or deadlink.
- A 422 comes up when the server was unable to process a command, event even if everything else seems to be running smoothly.
- A 500 error message indicates an internal server error and isn't able to process a request.
Still need a bit of extra support?
We know that there are times when a little extra help is needed. If you're having any issues with using the Cloud you can connect with our Project Manager, Samantha Fritz (email@example.com).
Can I share the derivative files publicly?
Absolutely! The collection you're using do not belong to the Cloud, it's yours/your institutions, so feel free to share and use these derivatives however you see fit. We would advise that you check with the appropriate contact at your institution to ensure sharing datasets complies with any policies that may be in place.
When you do share or use derivatives, we ask you consider using the citation below in your publications. We appreciate your assistance in furthering the the recognition of using open-source tools for scientific inquiry, assisting with growing the web archiving community, and acknowledging the efforts of contributors to this project.
Archives Unleashed Project. (2019). Archives Unleashed Toolkit (Version 0.17.0). Apache License, Version 2.0.
I'm worried about reproducibility! Can I see the code that was used to generate my data?
Absolutely! The Archives Unleashed Cloud always runs on the latest released version of the Archives Unleashed Toolkit. You can see the latest release by visiting this page. If you want to dig deeper, you can visit the open-source repository for the Archives Unleashed Cloud here and the Archives Unleashed Toolkit here.