Email preservation: it ain’t easy. I knew that before I even started looking at it but there were a number of factors which prompted me to begin having a look at what we could do here at MRC. I have been spending quite a bit of time getting to know the collections here and get a feel for what sorts of digital material we have. This is normally the very first step in undertaking digital preservation work. Understanding the collections means we can then prioritise and target particularly vulnerable formats, and make plans to tackle formats which will cause particular issues (eg 5.25 inch disks, 3.5 inch disks and so on). I have not completed this process yet (there are several thousand accessions to pick through) but I have turned up some emails included in the collections in a variety of formats: printed out, copied and pasted into word processed documents and so on. We need to be preparing ourselves to deal with a new deposit of material which we might be offered from a trades union, activist or one of the many other people or organisation which come under our collecting policy, as it is almost certain to include email. The fact that it hasn’t yet become a huge issue is most likely because we haven’t yet asked the question.
I am also starting to think about how we collect archives from the university itself more effectively and inevitably this includes email correspondence in a number of different settings. I probably wouldn’t be able to even consider this were it not for our Records Manager who is working on the front line of records creation. I don’t think I can emphasise enough the fact that it doesn’t matter how many all-singing all-dancing technological solutions we put in place to “preserve” the digital stuff, if we don’t have actual have people and resources in place with record creators (in whatever capacity this might be) we can’t hope to capture the things that are really important (whether for cultural, legal, evidential or other reasons). Preservation: it’s all about resourcing.
Whilst we may be a little way away from tackling some of the complexities of emails as archive collections, a more pressing use case for us is to preserve the correspondence which accompanies our collections and include it in the submission information of our SIPs ingested into Archivematica. So I have been looking at some tools which would be useful in this process and thinking about what they can do and how they might fit into a transfer/deposit/appraisal/ingest workflow.
ePADD, which is developed and maintained by Stanford University, describes itself as “the all-in-one email appraisal, processing, discovery, and delivery solution for donors, archival repositories, and researchers.”
It’s worth noting here that it does not mention preservation – that isn’t actually what ePADD does – although it’s often mentioned in the same breath as various other preservation tools. ePaDD is designed to help with the acquisition, appraisal and management of email collections, so particularly around capture and content management. I would argue of course that this is all part of the preservation process but it’s fair to point this out as other tools will be needed for the processing and preparation of emails. I have also been looking at Emailchemy and Aid4Mail both of which help with converting email export packages into preservation formats.
ePADD comes with excellent and very clear instructions on downloading and using the software and there is a detailed and active community forum. ePADD can link directly to a mailbox where you can select folders to capture emails from or you can upload emails which you have exported from a system in an MBOX format. This is the standard export file format used by many email clients but NOT (inevitably) Outlook, which uses the proprietary .pst format. This is where programs such as Emailchemy or Aid4Mail come in – they both enable the user to convert .pst files to MBOX format.
I got ePADD installed on my pc but immediately ran into problems…
The good news is that Josh Schneider at Stanford was extremely quick off the mark with a diagnosis – that my pc did not have enough RAM. He suggested running the Java version from the command line where you can specify how much RAM to allocate to the program. I’m not a very techy person so although this sounded a bit daunting, again the excellent instructions for ePADD meant I could do this.
Oh well. At least it confirmed Josh’s diagnosis. And gave me a next step, which is to get more RAM behind me and hopefully get testing ePADD properly.
I was hoping this post would be documenting my adventures with email acquisition and appraisal but I’ll have to leave that for another day. It came at a good time however as ePADD has been nominated for a Digital Preservation Coalition SSI Award for Research and Innovation. Based just on my experiences so far my vote is definitely going to ePADD as the documentation and support have been excellent and it looks as if it’s going to be a product with a lot of possibilities for us.
In the meantime – as with so much in digital preservation – I’m just going to have to look for more resources.