Do not attempt this at home

So we all know what it’s like – there is SO MUCH literature out there to read about on the theory of Digital Preservation it can be quite unnerving actually having to do the practical stuff and sometimes there seems to be a bit of a gap between the theory and how you might actually do that thing.

My case in point was quarantine. Every Digital Preservation handbook, how-to policy and procedure said some variation on “”digital files will be kept in a controlled environment for 30 days to protect against viruses”. This makes perfect sense to me and anyone used to dealing with physical archives (I refuse to call them analogue!) will be used to dealing with the concepts of quarantine – nobody likes pests or mold and we like them even less if they get anywhere near our strong rooms and archives stores. So potential contaminated documents will be put into isolation and treated with the appropriate chemicals and left until we can be sure that the mold or pests are dead. And so it is – sort of – with digital archives – that if we leave them isolated for a period of time (30 days allows for emerging threats to have been identified by updated virus checkers) hopefully the nasties will be mopped up (or wiped out) by our more up-to-date anti virus software.

Picture of a mouse
threats to the archives need removing before processing takes place

I also know that – partly for this reason – it’s important to have an isolated workstation and there’s some great advice out there for setting something up which needn’t be enormously costly or complicated (see this blog post by Porter Ohlsen). But until I sat there with a USB stick in one hand and a write blocker in the other I hadn’t considered how this would actually work.

WARNING – do not attempt this at home! I attached the write-protected USB drive to the workstation for initial examination – establishing checksums and a quick glance at the content using FTK Imager (I am still testing and comparing between this and BitCurator but that’s a story for another blog post). And then I thought – what do I do now? Do I sit and wait for 30 days? How do I set a reminder for when the 30 days is up? And what if I get another deposit next week and I want to start processing that? I realised that despite all the literature on the subject everyone was surprisingly silent on the practicalities of how this was supposed to work. So I turned to Twitter for advice:

And I got some great advice and discussion from this. Ross Spencer (@beet_keeper) suggested asking questions like:

  • How old is the deposit?
  • How long has it been since the checksums changed?
  • How long has the antivirus at the depositor’s site been running on the material?
  • Can you process the material in an isolated environment, and how long will the processing take?

Somaya Langley (@criticalsenses) suggested that most older material (ie lying around for a while before it has been processed) will have already passed its de facto quarantine period. She also suggested using more than one anti-virus tool – as a belt and braces approach. She commented that generally there is a lot of unseen labour put into the management of curation workstations which tends not to get documented at all… there is certainly planning to do around managing a non networked machine and how to ensure it gets updated regularly but on a schedule that suits the work of the archives.

David Underdown (@DavidUnderdown9) suggested the delay between deposit and final ingest is such that this should mop up any viruses but admitted working in a lab environment with a network and storage helps with the management of different deposits. Here at the Modern Records Centre it’s just me and my digital curation workstation. And I now realise (as I stare at my out of bounds machine) that I’m going to need a more nuanced workflow for this work based on the points which Ross made about the background to the material I am processing and also looking more closely at my own institution’s policies on anti-virus checks. Some of this is risk assessment and I would love to see current work on risk assessments in digital preservation looking at this in more detail. 

What I really want to see is more discussion about what people out there are doing in terms of quarantine and how much of a risk they deem it to be? In the NDSA Levels of Digital Preservation (a hugely influential and valuable set of recommendations for beginning digital preservation work) virus checking is variously at level 2 “Virus check high risk content” and Level 3 “Virus check all content”. It’s not on Level 4 at all I presume because you are doing it already at Level 3. A little bird tells me that in the upcoming NDSA Levels reboot there will be a more nuanced version of this, but it still won’t address how we spell out exactly how quarantining and virus checking fits into the work flow.

In the meantime I’m going to work on creating a more in depth workflow which tries to balance the risks and the practicalities of managing born digital stuff so that capture and identification is timely, safe and consistent.

