The preservation jigsaw puzzle

University of York Central Hall. Philip Pankhurst, via Wikimedia Commons

I had a great day meeting with Jen Mitcham, who blogs here on her work at the Borthwick Institute and also Laura from University of Sheffield Library to talk about and share experiences of digital preservation.  The needs and set up in our various institutions are different but we share many areas of concern, such as the need for advocacy and the challenges of integrating traditional archival theory and practice with the management of digital data.


We talked a bit about terminology and about how confusing this can get.  “File” and “archive” mean very different things to different people.  It might not just be a question of avoiding confusing terminology but aso have learning a new language to discuss familiar territory.  For archiving read sustainability, for curate read preserve, for file read item and so on.

Old vs new

There are many areas where the management of traditional archives and born digital overlap.  All those involved in preservation – digital or otherwise – need to tackle some basic questions. What have we got? What should we be collecting and preserving? What are we trying to achieve? It is as important for the traditional archivist to have a sense of what to collect, why collect and who is the audience as it is for the digital archivist.

This could be formalised in a policy document or plan but need not be; there still needs to be a clear sense of direction and structure for the work being undertaken.

The need in the first instance to have intellectual control of the collections – whatever they are.  The first question any archivist should ask themselves is “what have I got”? This can be a particularly difficult question to answer if it is obscured by the physical medium of the items themselves – whether they are documents written in a language or hand which is hard to read (it could be Latin, secretary hand or an early version of WordPerfect) or in a format which is inaccessible (water damaged document, reel-to-reel tape, 5 1/4″ floppy disk).  But even though it might be a technically difficult question to answer it’s one which most people can begin to tackle – even if the answer (description) of the data is “medieval document” or “word processed document” or even “research data in a study of termites”. Before making any further progress we have to now what the scale of the problem is.

Pieces in the preservation puzzle

Here at Lancaster University we are focussing on the management and preservation of research data – that is the raw data from scholarly outputs that the university staff and students produce.  This data is undeniably valuable and useful but only if it can be accessed and reused for the long term and be trusted, just like any other evidence (or archival document).  We are considering ways in which we can map research data outputs* which I think will have really big benefits for preservation planning and go some way towards tackling the questions about what we have, what might we be receiving, how long do we need to keep it for.  Because of the huge volume of data we are talking about it makes sense to try and automate as much of the process as possible – something my colleagues at York have been giving some thought to.

And while some of this might seem a little removed from those grappling with legacy files on outdated systems, managing emails, corporate archives or whatever else, they are all pieces in the preservation jigsaw puzzle.

What we should be collecting is a decision for each individual repository but the important thing is that this is clearly (although not necessarily) rigidly defined and that stakeholders (depositors, users, researchers) are consulted and involved.

Crossing the digital divide

What are we trying to achieve is the long-term preservation of archives/data and making them available and discoverable. This is something which presents the big challenge but one for which solutions are continually being developed.  There is not going to be one software tool which will adequately do all of this but then there isn’t one single solution for arranging, describing, indexing, storing, labelling and retrieving traditional archives and there are likely to be a range of solutions which are suitable for one, the other or both.  We are currently inhabiting a world of hybrid digital non-digital archives and we need to be thinking about solutions which cross this perceived divide. These might be old or new but we need to bring together preservation and digital preservation to look at how we manage archives whatever their format.

There’s more about what we mean when we talk about archives here from Kate Theimer.

I have also been following the various Archive conferences in the US, Australia and Ireland.  A nice blog post here from SAA2015 and all the ARA UK and Ireland Conference tweets can be found on Twitter at #ARA2015 including (I think for the first time) a digital preservation strand.  There was a lot in all of these conferences on the subject of advocacy.

*if you are interested in Research Data Management you might want to read more about our JISC funded project here


One small step

I was excited to read today that the National Archives have made their first born-digital records archives available via the Discovery tool.  This marks the beginning of a long process of integrating the traditional and born digital elements of the government’s records into one portal in what (hopefully) will be a seamless point of discovery. This might seem like a small step for many but having spent my first few weeks grappling with the issues around managing, preserving and making available digital data I can only be extremely grateful that there are pioneers there who are paving the way to enable all archives, whether purely digital (as here at Lancaster) or hybrid (as surely all others will eventually become), can look to making their holdings available for anyone who wants to access them.  

The focus on discoverability has been shaping the way I have been thinking over the last week or so and it was good to read Professor Lorna Hughes of the School of Advanced Studies write about the necessity of public access to our heritage as part of the democratic process.  This includes the digitisation of archival and historical resources, the publication of Big Data and the preservation of the born-digital, all activities which as I mentioned in my previous post underpin the democratic process.

For an archivist from a traditional paper-and-parchment (yeah and the rest) background, getting to grips with the way digital data can and should be described so that it can be made available has been a steep learning curve.  I shall be looking at how the National Archives have used Discovery to weave together their hybrid catalogue but I have also been taking a lot of inspiration from the work of Jane Stevenson at Archives Hub and in particular her blogs on the AHRC funded Exploring British Design project. There is much to be gained from thinking above and beyond traditional archival description to help make the digital discoverable for as many people as possible. This fits in well with the emphasis on sustainability and access to research data which is one of the key drivers behind my role here at Lancaster University.

In the meantime my first small steps are going to be around drafting digital preservation policy and reading about what others have already done; by sharing and making available their work to the wider digital preservation and archives community benefits, everyone reaps the benefit.

I’ve been following #LIBER2015 tweets for lots of stuff about copyright, research data management and more….