When people ask me what I spend my time doing I generally say “digital stuff” or (when I’m being more frivolous) “staring at spreadsheets”. Sometimes I explain a bit further and say – “I try and ensure that old files and digital content will still be usable long into the future” and actually most people think about this and say – gosh – how do solve that problem?
Well the answer is, of course, that it isn’t something you “solve” any more than you would “solve” the problem of looking after archives, manuscripts or physical artefacts.
So on World Digital Preservation Day when we are all thinking about Digital Preservation – what are the challenges which a Digital Archivist faces and what approaches she might be consider taking.
I actually spend a good deal more of my time than you might imagine talking to people, emailing them and creating guidance and advice. If you are lucky – as I am – you have people in your organisation who are working with you who can spread the word about the importance of records and data management (and how this hugely enables the preservation process) and try and get the message out to the people who are creating the digital stuff in the first place. This part of the work is extremely time consuming; capturing the right data or documents in the best formats with the appropriate metadata and contextual information can involve a great deal of to-ing and fro-ing. It’s really useful to be able to find out how people work and manage their files. There is no sense writing guidelines for people to follow if it doesn’t reflect their own real life situation. It just means that people won’t follow them.
Having said that – especially for World Digital Preservation Day – I have updated and promoted some general guidelines for our potential and actual depositors which you can see here. Have a look and tell me what you think! I’m going to be using it to promote to internal and external depositor in the coming year.
The knotty problems
One of the biggest challenges of Digital Preservation – without doubt – is capturing the “stuff”* in the first place. This sounds so obvious but is worth repeating: you can’t save what you don’t have.
I’ve been working with our Records Management team looking at the records which the University produces – minutes of the Council and Senate and other committees; files which are vital for the running of the institution and need to be retained for both legal and historical reasons.
The files contain sensitive (for business and personal data reasons) so we a secure method of transfer which ruled out email (too easy to mis-send – in fact this is the largest cause of inadvertent data breaches). Similarly using a carrier format like a USB drive or a hard drive was potentially insecure and also very clunky and adding in an extra step to the deposit process. The harder it is to transfer the data the less likely people are to do it, as any systems designer knows… We turned to our institution’s own file sharing platform which satisfied our information security team and was already familiar to the people creating the files. Perfect – I thought – especially as we get a notification email once a file has been deposited. However – and there’s always a however – it appears that uploading the files to the file share changes the last modified date. Even worse – it appears that it sometimes changes the last modified date. And sometimes it doesn’t. So what I thought was going to be a straightforward solution turned about to be more of a knotty problem. Added into the mix is also the question of “how much does it matter if the last modified date is not correct”? We all know they are very unreliable and in this context I anticipate that most of the material will be deposited very soon after creation, so the dates will not be wildly inaccurate. Something for me to work on…
Back to the future
Much of the work I do is working with internal depositors, IT services, Records Management and increasingly with external depositors. However we still have plenty of work to do stitching together the work I do around digital curation and the traditional archives elements of the service. We have made a start examining our cataloguing practices for how fit they are for the future. There has been a lot of good work done already (for example by the University of California) on guidelines for cataloguing born digital and hybrid collections but it’s still very much a developing field. The impact and uptake of Records in Context is still unknown and we are not starting from scratch so the approach we are taking is to build on the cataloguing practices which have already been developed over many years here. At present we are only ingesting small quantities of digital material so we can craft our cataloguing practices to meet this need, but there is going to have to be some radical rethinking to meet large-scale deposits which will start to come soon enough. This will affect how we use our collections management tools and how we present the catalogue to the public. I can see some data modelling work taking place to help map relationships between versions and iterations (let me add that to the to do list below). All this is very much based in traditional archival principles and is going to involve getting everyone in the team onboard.
The Digital Preservation Community is relatively small and disparate but I could not do my job without you (yes that’s all of you out there with an interest in digital preservation!). Whether it is sparking ideas, offering advice and guidance, sharing best and worst practice (Digital Preservationists Anonymous – yes it’s a thing) or just being a sounding board the contribution of countless people out there in helping me is a fantastic thing. And a big shout out to the Digital Preservation Coalition whose invaluable work in enabling and bringing people together does a huge amount to support the community worldwide. I hope I do my best by sharing my joys and frustrations via Tweets
I made #archivescake #archivecake (which is it?) in honour of a colleague’s birthday and am eating it whilst testing Bagger and #Archivematica #FridayFun #DigitalPreservation pic.twitter.com/ZUkISpU46j
— Rachel MacGregor (@An_Old_Hand) Septem
and my blog.
So much to do
I’ve got an ever increasing list of things I want to do which includes:
- set up and test ePADD. You can read about my false start here but I’ve now got more RAM so there’s nothing holding me back apart from finding the time
- undertaking a survey of the datasets in our institutional repository WRAP to see what file formats we are dealing with. I’ll be interested to see how the results compare to what I found when I was at Lancaster University.
- get a forensic workstation up and running. It’s on order but I need it in front of me to be any use!
- revisit my digital asset register – I created this in my first couple of months but I need to return to it to look more carefully and match it to our catalogue and also to some storage management work that I’d like to undertake
So today is World Digital Preservation Day and I will be in Amsterdam celebrating the achievements of many in the community at the Digital Preservation Awards Ceremony. There are some brilliant nominations for some of the great work done in the last year or so and you can watch it streamed live here. Meanwhile I need to get on with my to do list and maybe start on a project worthy of a future nomination?
*stuff: technical term for the stuff made of bits and bytes… If you come up with a better term please let me know….