Archives Access and AI

I was really looking forward to this conference organised by Dr Lise Jaillant of the University of Loughborough and it did not disappoint. I took what I think must be a record fifteen pages of notes (yes I do use a real notebook) not to mention the countless tweet (see #AcArAi) so it will be impossible for me to do much more than summarise some of the highlights (for me at least) of this conference.

The conference brought together digital humanities scholars, archivists, digital preservation practitioners and others to discuss and share ideas about making archives accessible – either with or without the aid of machine learning/AI.

The conference was near Hackney Wick – an interesting part of London

Helen Mavin from the Imperial War Museum gave a fascinating insight into the very complex work being undertaken at the museum to manage and preserve over a million digital files deposited by the Ministry of Defence. This is material specifically covered by UK government legislation (the Public Records Act) so is not everything which the Museum collects or preserves (this was important when discussing their appraisal and retention strategies). Collections management and transfer procedures were out of date and key contacts were unknown or unclear. Mavin needed to establish robust criteria for retention and selection of digital materials to facilitate fast(er) and more efficient transfer. Once the material was transferred there needed to be standardised workflows which are media agnostic. Key challenges were around staff turnover and both a skills and resource gap – something which will be familiar to many.

I was really interested to hear from Jonathan Manton and Alice Prael from Yale University Libraries talked about their work on trying to centralise and standardise the workflows at their institution which comprises a number of libraries and a museum. They began by centralising the processing of imaging and file extraction from physical media which has helped standardise the process and assisted with tackling processing backlogs. They have also looked at email archiving, born digital catalogue description and network transfer. I was particularly keen to hear about the work they have been doing on describing born digital archives. This is something which really needs more discussion and action from practitioners – I’m finding it presents a very real headache for my own practice. Prael and Manton commented that the area which caused most difficulty was describing hybrid archives. I wasn’t surprised by this and given this is the landscape we are going to be working in for some time to come it’s one we should all be giving a lot of thought to.

The cataloguing of archives is closely linked to (but not coterminous with) access so I was also keen to hear from Anthea Seles talk about the use of Machine Learning and Artificial Intelligence in archival processing and the extent to which archivists and information professionalism should (but are not necessarily) involved in the creation and use of algorithms and the ways in which data is explored, exposed and exploited. Seles’ talk gave a great deal of food for thought and my homework will be to read Cathy O’Neil’s Weapons of Math Destruction.

I also very much enjoyed hearing from Leontien Talboom who is undertaking a PhD jointly with the UK National Archives and and University College London looking at the barriers and opportunities which exist in making digital archives available. So far she has uncovered what a huge amount of work still needs to be done in this area but I’m looking forward to work coming out in full so we are better informed to overcome the barriers and exploit the opportunities. In the same panel another PhD student Rebecca Oliva from the University of Glasgow is looking at sensitivity reviewing and the extent to which it can be automated. Manual sensitivity reviewing is on the whole an opaque process, therefore lending itself well to automation and machine processing. However sensitivity is also very context specific ie data can be sensitive in one context and not in another so there are many challenges to be met. Again I am really looking forward to the outputs of this work.

On the final day it was good to hear from Jenny Bunn who invited us to ask “What can archivists bring to the (AI) party?”. The answer is (hopefully) quite a lot. AI has been around for a while now and it is starting to grow up. more people are asking for accountability in AI and this is where the archivist comes in – we are really really good at documenting and organising things!

We had an entertaining presentation from Caylin Smith and Andy Irving from the British Library on their struggles with making non-published legal deposit material from the British Library available. They are extremely hampered by legislation which has not yet caught up with the technology (a point which Paul Gooding from the University of Glasgow also addressed in his presentation) but have made great strides to improve what they can, certainly in terms of the user experience of accessing material on site. We also had a heard a fantastic call to arms from William Kilbride of the Digital Preservation Coalition which he has helpfully published as a blog post so I don’t have to report on it!

Many of the conference presentations are available here and the abstracts here. I thought this was a fascinating and thought provoking conference and I am still thinking about many of the themes which came up in it and hope to draw on in my practice. many thanks to Dr Lise Jaillant for all her hard work in putting this together.

Conference cake which like everything else was excellent!