AURA Network Workshop: Open Data vs Privacy

National Library of Ireland (By YvonneM – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15120535)

The last non-virtual conference I attended was back in January in London at the Archives, Access and AI conference which I blogged about here and that work has been built upon to form the AURA Network (Archives in the UK and Republic of Ireland and AI) which focuses on trying to unlock digital assets held by cultural institutions. So I was very excited to have the opportunity to travel to virtual Dublin (for the second time this year!for this workshop which had a fabulous programme – and many thanks to Dr Lise Jaillant of the University of Loughborough and Dr Annalina Caputo of Dublin City University for putting together and co ordinating a great event. Day One focussed on Open Data, Privacy and AI and Day Two on issues of access to Born Digital archives – very much interlinked topics. You can see the full programme here and all I will do is cover a few of my highlights.

The conference opened with a virtual tour of the National Library of Ireland from an architectural perspective, led by Brid O’Sullivan which was wonderful and really made me determined to visit, not least because she mentioned that they have the “second* best toilets in Dublin” (according to one newspaper anyway!). I’m adding it to my to-visit list for my next Dublin trip!

The opening session began with Rob Brennan from DCU exploring the difficulties of dealing with GDPR and personal data in the age of the digital deluge. The volume of information being created and sloshing around the place means that traditional or even more recent methods of tracking data (using spreadsheets for example) just don’t scale up – “please feel sorry for the Data Protection Office” said Brennan! The answer to this problem might be in using machine learning to create computational methods of identifying personal data and Brennan highlighted the work of the Data Privacy Vocabularies and Controls Community Group, part of WC3, who have been developing a taxonomy of privacy and data protection terms to assist with just this sort of issue. Hopefully this work will start to remove the headache for all DPOs and anyone else dealing with sensitive data – this will is potentially valuable for all archivists and records managers.

This was followed by a wide ranging and thoughtful presentation from Rachel Hosker from the University of Edinburgh with her wonderfully titled “Beautiful Messy Data: Archival Access and Data Protection” which explored the issues familiar to archivists everywhere: digitised and digital archives can be investigated using data science methods but they are unstructured and it can be very difficult to identify and manage issues relating to privacy when you don’t know what the records might contain. And whilst machine learning and natural language processing give hope that some of these difficulties may be overcome using automated processes there is still much to be learnt about biases in these processes. Hosker highlighted new work by Lucy Havens, Melissa Terras, Benjamin Bach and Beatrice Alex which aims to unpick some of this.

Finally Frederic Saunderson from the National Library of Scotland invited us to consider the differences between data protection and privacy, which overlap but are by no means the same. The issues over access can be further complicated when rights management is exercised and practitioners need to be very careful to apply the appropriate access framework to different collections.

The round table session which followed covered topics as varied as semantic web technologies and data models to manage access, the FAIR principles, how AI failures are human failures and more on text mining. I really enjoyed how the different aspects of “access” were explored by those working in very different disciplines but all with a common goal of ethical access for the greater good and not to the detriment of the individual. It is very heartening to be at events where these challenges can be explored so that both researchers who want access and information managers who want to protect privacy can understand the challenges on each side.

Day two opened with another round table with more semantic web research, automated sensitivity reviewing, recreating serendipity in digital searching and the problems of disambiguating in large scale digital searches. I was particularly taken with Lucy McKenna (Trinity College Dublin)’s work on authoritative interlinking for semantic web cataloguing which is shared here. The panel reflected on the need to share skills and improve communication between disciplines, something we probably are aware of on one level but don’t spend enough time on….

Dublin – can’t wait to get back there for real! (Image by Claire Tardy from Pixabay)

Eilidh MacGlone from the UK web archive then opened up the next session talking about the work of the UK’s official web archive who work hard to capture the UK’s legacy web content, in parallel to the way the network of copyright libraries capture other published outputs. However they are necessarily restricted by what they can collect – no access to material behind log ins and so forth. A parallel presentation came from Joanna Finnegan and Della Keating from the National Library of Ireland who have responsibilities for the Irish web archive. They talked about successful use of crowd sourcing (such as Flickr) to help identify people and places featured in some of their collections.

Next came Paul Gooding from the University of Glasgow discussing his work looking at how researchers actually use digital collections – it is not easy to collect data on this in an ethical way and there is much that is “hidden” in terms of the way user analytics are exposed. Ciaran Wallace of Trinity College Dublin then continued the theme of user approaches by talking about how the historian approaches digital sources, although primarily the focus was on digitised sources and how a “definitive” history is made using accessible and discoverable resources. This is obviously always the case with whatever kind of archival collection we consider and both archivist and researcher have to be as explicit and transparent about curatorial methods, decisions and dissemination. The final presentation in this session came from Gareth Jones from Dublin City University talking about search and access in broadcast media archives and was the reminder that I didn’t need that audio visual archives are extremely complex!

The afternoon concluded the workshop with another wide ranging round table discussing metadata and rights issues, interoperability and more on the semantic web, Covid collecting and social media archiving and the teaching of digital curation. Again lots of very current and interesting debate and byt the end my head was bursting with thoughts and questions and things to follow up.

It might have been the end of the workshop but there are two more planned in 2021 and there is also a call for papers for a special issue of AI and Society entitled Shedding Light into the Darkness of Digital Culture. Abstracts are due by 11th January so if you feel you have something to say on any of these wide ranging topics then take a look at the call. I can’t wait to see these outputs and in the meantime am going to be looking at data modelling and ontologies with renewed enthusiasm!

Many thanks to Andrew Janes (UK National Archives) whose tweets I relied upon heavily for this summary!

*the best toilets in Dublin are in Brown Thomas apparently

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s