MidiPres May 2021: Cataloguing Born Digital

Image credit: Image by DreamQuest from Pixabay

It’s now over twelve months since Laura and I launched our “digital preservation support network” (at a real live event – imagine that!). It’s heartening to see how much we’ve developed during this time. For the Spring MidiPres meeting we had taken a bit of a poll on what topics people would like to focus on and “Cataloguing Born Digital Collections” was quite high up on the list.

We invited a few people to share their experiences of cataloguing born digital materials to get things started but in the spirit of the network this was as much to invite further debate and comment from others and get us all thinking. I know I find this useful as it gives me ideas to borrow and develop. It’s also helpful to be able to get a sense check from others – does my theory (or practice) make sense? Might there be better ways of doing it?

Our first presenter reflected on their considerable experience of cataloguing analogue collections and the challenges they were anticipating in tackling born digital collections. There were some technical challenges such as managing the integration between the CALM cataloguing software and the Preservica preservation system but these (as is so often the case) were smaller challenges than dealing with the size and complexity of the material coming in. Archivists spend their time making sense of the archive and presenting it to the outside world in a way that is intelligible and digestible, so there were inevitably going to be challenges with very large deposits which either had no structure or an extremely complex one – how is this best presented in a catalogue to facilitate discovery? Sometimes file names are user friendly, sometimes they are not and there are always the tensions of capturing the essence of what was created and facilitating users in their discovery of it. There was some discussion around managing personal data – identifying it in amongst a huge quantity of material and what resources there were for automated ways of doing this.

Our second presenter focussed more on the practicalities of cataloguing standards and software. Many MidiPres members use CALM as their cataloguing software but by no means all; we have a strong feeling that the network should aim to be vendor neutral as far as possible which helps us be more inclusive. Several members do not have specific cataloguing software at all – usually using a spreadsheet to catalogue and manage their collections information. The CALM cataloguing system (like others) is standards based and as is common in the UK most people adhere to ISAD(G) for their cataloguing with local variations in practice. Increasingly, as ISAD(G) was not developed for digital collections, practice forced to diversify and devlop and whilst alternatives are emerging (notably RiC) I’ve yet to get a feel for people embracing this. A lack of standardisation in the way we describe our collections is likely to have a negative effect on their discoverability – especially across different institutions – so the more we share experiences and practice the better.

There was some discussion of that perennial problem of dates – do we record the creation date, the last modified date, the presumed actual date of the document and if so where and how to record and then represent to the researcher. Which led us on to the capture and use of the metadata which we create around digital collections – it’s not something that any of us were aware of being made publicly available routinely but it is definitely a consideration. There was quite some discussion more broadly about how we represent born digital collections via our catalogues and the only consensus is that it depends on the collection! We also mooted how we managed digital content within our collections, especially hybrid ones, and there were various practices shared for indicating the presence of digital content to give the archivist an easy way of gaining an overview of what is in the system, such as assigning an accession specific code or creating a drop down or Y/N field in the collections management system. This allows for greater reporting functionality although probably doesn’t address the need for a more granular approach.

Our third presenter talked about the move from traditional paper oriented ways of cataloguing towards incorporating digital – I think this is something which many struggle with because standards and software are so deeply rooted in the paper world that even if you want to move on from that (both in terms of the types of collections you are working with but also in the way in which collections are made available online) it can be something of a struggle. There was some good discussion of no “one size fits all” approach working for born digital (as indeed is the case for analogue archives) and Trevor Owens’ excellent “Theory and Craft of Digital Preservation” book was referenced – an excellent read for anyone interested in getting into more depth with the subject.

Sharepoint feels like the Wild West to many of us…
(Image credit by Brigitte makes custom works from your photos, thanks a lot from Pixabay)

There was lots of discussion around what the records creators/depositors can do for us. Some archive offices asked for lists to accompany any deposits which inevitably varied in usefulness, but in some cases were seen as explicit statement of balancing the depositor/archivist relationship so the archives were not seen as a dumping ground for material which was no longer considered immediately useful! There was some discussion about how or even if it might be possible to acquire other sorts of creator generated metadata (checksums being the most obvious one) and despite the wealth of digital preservation literature recommending this as good practice, most if not all of us felt this was at present completely unachievable (and some had attempted it). It was even observed that in some situations putting demands for lists (for example) on depositors would just lead to nothing being deposited at all. The real world, it turns out, is quite a long way away from the text books.

There was loads more covered in the meetings including a SharePoint Anonymous discussion (“it’s like the Wild West) and some thoughts on licensing of digital materials, both of which felt like they could form the core of future meetings. We certainly aren’t going to run out of things to say any time soon and I really feel we are helping everyone to build confidence in the sector, which is what we set out to achieve.

Leave a comment