I had a great day meeting with Jen Mitcham, who blogs here on her work at the Borthwick Institute and also Laura from University of Sheffield Library to talk about and share experiences of digital preservation. The needs and set up in our various institutions are different but we share many areas of concern, such as the need for advocacy and the challenges of integrating traditional archival theory and practice with the management of digital data.
We talked a bit about terminology and about how confusing this can get. “File” and “archive” mean very different things to different people. It might not just be a question of avoiding confusing terminology but aso have learning a new language to discuss familiar territory. For archiving read sustainability, for curate read preserve, for file read item and so on.
Old vs new
There are many areas where the management of traditional archives and born digital overlap. All those involved in preservation – digital or otherwise – need to tackle some basic questions. What have we got? What should we be collecting and preserving? What are we trying to achieve? It is as important for the traditional archivist to have a sense of what to collect, why collect and who is the audience as it is for the digital archivist.
This could be formalised in a policy document or plan but need not be; there still needs to be a clear sense of direction and structure for the work being undertaken.
The need in the first instance to have intellectual control of the collections – whatever they are. The first question any archivist should ask themselves is “what have I got”? This can be a particularly difficult question to answer if it is obscured by the physical medium of the items themselves – whether they are documents written in a language or hand which is hard to read (it could be Latin, secretary hand or an early version of WordPerfect) or in a format which is inaccessible (water damaged document, reel-to-reel tape, 5 1/4″ floppy disk). But even though it might be a technically difficult question to answer it’s one which most people can begin to tackle – even if the answer (description) of the data is “medieval document” or “word processed document” or even “research data in a study of termites”. Before making any further progress we have to now what the scale of the problem is.
Pieces in the preservation puzzle
Here at Lancaster University we are focussing on the management and preservation of research data – that is the raw data from scholarly outputs that the university staff and students produce. This data is undeniably valuable and useful but only if it can be accessed and reused for the long term and be trusted, just like any other evidence (or archival document). We are considering ways in which we can map research data outputs* which I think will have really big benefits for preservation planning and go some way towards tackling the questions about what we have, what might we be receiving, how long do we need to keep it for. Because of the huge volume of data we are talking about it makes sense to try and automate as much of the process as possible – something my colleagues at York have been giving some thought to.
And while some of this might seem a little removed from those grappling with legacy files on outdated systems, managing emails, corporate archives or whatever else, they are all pieces in the preservation jigsaw puzzle.
What we should be collecting is a decision for each individual repository but the important thing is that this is clearly (although not necessarily) rigidly defined and that stakeholders (depositors, users, researchers) are consulted and involved.
Crossing the digital divide
What are we trying to achieve is the long-term preservation of archives/data and making them available and discoverable. This is something which presents the big challenge but one for which solutions are continually being developed. There is not going to be one software tool which will adequately do all of this but then there isn’t one single solution for arranging, describing, indexing, storing, labelling and retrieving traditional archives and there are likely to be a range of solutions which are suitable for one, the other or both. We are currently inhabiting a world of hybrid digital non-digital archives and we need to be thinking about solutions which cross this perceived divide. These might be old or new but we need to bring together preservation and digital preservation to look at how we manage archives whatever their format.
There’s more about what we mean when we talk about archives here from Kate Theimer.
I have also been following the various Archive conferences in the US, Australia and Ireland. A nice blog post here from SAA2015 and all the ARA UK and Ireland Conference tweets can be found on Twitter at #ARA2015 including (I think for the first time) a digital preservation strand. There was a lot in all of these conferences on the subject of advocacy.
*if you are interested in Research Data Management you might want to read more about our JISC funded project here