Radical Collections

Senate House
Image: Steve Cadman, Flickr (https://www.flickr.com/photos/stevecadman/496743569)      CC BY-SA 2.0

I attended the inspirational Radical Collections conference held at Senate House on 3rd March which was part of their Radical Voices season.  The main themes of the conference were collections development, the politics of cataloguing and widening participation and representation.  Many of the papers focused on more than one of these themes and the papers and audience were a good (healthy?) mix of archives and library professionals and others.  My role as digital archivist is not just about preservation but also access to digital collections and their on-going management.  Archive collections (and other library special collections) do not sit in isolation and have to be considered as part of a wider cultural and political background.  Decisions made by library and archive professionals have consequences for their donors and users. The importance and significance of the context collecting and managing is key to meeting equality and diversity agendas.

The first session looked at some collections with “radical” contents: Ken Loach’s archive at the BFI, the Underground and Alternative Press Collection at the University of Brighton and the archives of Radical Psychiatry (or anti-psychiatry).  The collections referred to were very varied in subject matter but were united in the way in which they cast light on “alternative” narratives.  Ken Loach’s archive reveals  interviews reveal the dissenting voices of union activists of the 1970’s and 80’s which are not otherwise represented in the official archives.  Brighton’s underground and alternative press collection documents the hugely influential narratives of alternative community activity in Brighton – much of which has since become mainstream, such as environmental activism, but which had its origins in alternative activism.  Likewise the history of the development of psychiatry has a counter-narrative of alternative practice.

The second panel looked at some of the more political aspects of library collections and tackled questions as diverse as as varied as discriminatory library cataloguing systems (and practice) and the predominance of whiteness in librarianship.  The papers were a useful reminder – if that were needed – of the constant need to address inbuilt discriminatory practice.  Inclusion and presence is not enough and sometimes the structures themselves need to be challenged to lessen discrimination.

Zine Library
Image: Cory Doctorow, Flickr (https://www.flickr.com/photos/doctorow/100318253)             CC BY-SA 2.0

In the afternoon there was more on radical histories drawn from collections, from the children’s literature in Cork reflecting the emergence of the Irish Free State at the beginning of the twentieth century, through the archive of a women’s organisation of the 1990s to the issues around the preservation and access of zines.  This session had a lot of focus on the personal relationships which develop between the creator of the collections and the collecting institution.  This exposes tensions where there are ideological differences between them or where the creator has ideological disagreements with the collecting institution – something which was returned to later.

The final panel looked at the issue directly relating to the workforce.  I was asked to step in to chair this session at the last minute – which I was very happy to do as it was a fascinating range of papers only marred by a fire alarm which interrupted the first speaker.  Tamsin Bookey from Tower Hamlets revisited the issue of whiteness in libraries and archives both in terms of users, collections and staff.  She also looked at the Social Model of Disability in relation to archives provision.  Katherine Quinn in the second paper looked at the challenge of radical librarianship in the HE Sector and finally Kirsty Fife and Hannah Louise Henthorn discussed approaches to diversifying the archives sector and launched a survey which you can take part in: Marginalised in the UK archive sector.

The conference was extremely thought provoking and there are a number of issues that I have been reflecting on with respect to my practice.  Libraries and archives are not neutral spaces nor are they “a static auxiliary” to education, as defined by some sociologists.  Any collecting or engagement activity in libraries or archives needs very careful assessment and critique to support equality in service provision and maintain transparency (rather than neutrality which is not achievable).

Looking forward

jogger-jogging-sport-marathon

A picture of people doing a lot more exercise than I do! (image: http://skitterphoto.com/?portfolio=4-mile-run-groningen)

Like a lot of people I have spent January setting priorities for the year ahead.  I haven’t given up chocolate or done any more exercise but I have been giving some thought to both where I would like to focus in my work and some of the areas I would like to develop in my practice.  One of the first things I would like to do is sign up for an xml course – I’m keen to improve my technical skills and this looks like a good place to start.  I will always be an archivist not a developer but I want to be able to to have more confidence to be able to:

  • talk to developers and IT colleagues
  • develop a more critical approach to choosing tools to work with
  • try out more technical tasks such as file format id-ing
  • explore more possibilities of using data in a digital humanities contexts

Preservation workflows

startup-photos
(image: startupstockphotos.com/post/123128198211)

Other things I’m focusing on at the moment are conducting an in-depth analysis of my digital preservation workflow.  We’ve been playing around with automating elements of our workflow which ingests and processes research data and then prepares it for long term preservation. What I have planned out at the moment is very piecemeal and I know from experience that piecemeal solutions hide weaknesses and dependencies that have not been fully thought through.  Our test instance of Archivematica fell over because of an upgrade elsewhere on the system – lack of communication and insufficient planning led to a problem.  This is of course why we’re not yet in the production stage but it did bring it home to me about how important solid planning and the identification of dependencies are (if that wasn’t already apparent!).

Getting the information out there

startup-photos-1
You won’t need a Mac to access our catalogue… (image: startupstockphotos.com/post/123128547586/at-barrel-soho-nyc)

I’m also exploring cataloguing systems and am currently playing with AToM – an open source standards-based a cataloguing system from Artefactual (who also develop Archivematica) which looks to offer many of the things which we will be requiring.  I have some existing catalogues to import (which is proving rather more tricky than I thought it would be) but I like what I see of what the system offers in terms of standards conformance, ease of use and interoperability.  I am looking for a system that will nicely expose digital and non-digital descriptions side by side and an integration with Archivematica is important for this.  I am also keen for it to work alongside our current Onesearch library catalogue to allow users to navigate across collections and find their way around everything the university has to offer.

Blogging

I want to get into the habit of regular blogging and have been inspired by Jen Mitcham’s regular Digital Archiving updates as well as Kirsty Lee’s Bits and Pieces.  A longer read which I will be coming back to which is worth a look is Bentley Historical Library‘s Appraising Digital Archives with Archivematica paper which was written from elements already appearing in their blog.

So – here’s to a busy year of digital archives!

 

Piece by piece

Old_College_of_Edinburgh_University
The Old College, Edinburgh (Image by Kim Traynor – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=18939432

Breaking the boundaries

Recently I had the chance to take a trip up to Edinburgh University to take part in an event called Research Data, Records and Archives: Breaking the Boundaries which was organised by Edinburgh University to address “the challenge of managing research data in relation to records management and archives”. This was especially interesting to me having recently spoken about this subject at a Digital Futures conference in Cambridge (you can see my slides here.

Building blocks

The venue was the beautiful Playfair Library Hall, begun in Neo-Classical style in 1789 and finally completed by William Playfair and put into use as the University library in the 1820’s. It took quite a long time to finish building the library that the university wanted which has made me feel a bit better about the progress I’ve been making in digital preservation here in Lancaster! The Playfair Library now serves as a fantastic venue for a range of events such as this workshop where we were drawn together from a range of different disciplines to talk about research data and how to build for the long term. With so much of a reminder of the influence of the past around us it was good to focus on how we are going to continue to preserve and maintain academic endeavour.

 

playfair
Playfair Library (Image: Rachel MacGregor, 2016)

We were archivists, librarians, data managers and others from a wide variety of institutions and situations brought together with a common purpose and to compare and share approaches and experiences. Digital preservation is a slow and iterative process which needs a range of tools, processes and skills bolted together to work towards the long term goal. Every situation needs a slightly different approach according to the needs and resources available but we can all learn from each other and contribute towards making progress.

To keep or not to keep

The morning session focussed on a variety of presentations from information professionals and also a couple of case studies. It was refreshing to hear from a real life researcher talking about the importance of the re-use of data, in this case Professor Ian Deary whose research was based on a large scale dataset from a population study of the 1920’s and 30’s. This data was in paper format of course but became the basis for invaluable research into the effect of aging on the brain. Deary made the valuable point that the research he has undertaken was only possible because the dataset had not been sampled – data from the entire cohort had been kept. This sat a little awkwardly alongside the earlier call from our introductory speaker Kevin Ashley (Director of the Digital Curation Centre) who exhorted us to get “better at managing and better at throwing things away”. In fact this is not a digital vs non-digital issue – the tension of managing data with finite space and resources has always been there and appraisal techniques have been developed to help with this problem and work towards a solution.

The records continuum

The need to be involved in all stages of the lifespan of data was highlighted by a number of speakers, including Rachel Hosker of the University of Edinburgh who called for greater communication and collaboration with data creators and depositors. I think most of us would agree that this was the best approach, but how practical or sustainable it is, particularly when dealing with a deluge of research data from a multiplicity of sources I am not sure. What I do think is that we should be seeing data as part of the records continuum model – one which has been around a long time but which in the UK at least has not always had the prominence it should. In research data terms the model is almost always that of a life cycle and a move towards seeing it as a continuum would leave those managing and preserving the data in a much stronger position to plan for and develop strategies to ensure both long-term survival and access to data (or archives or records or whatever you like to think of the “stuff” as.  I think there’s another blog post in there).

Identifying what we have

The afternoon brought us together in small groups for discussion of some of the key problems – and solutions – as we saw them of managing research data. My group – which was a mixture of archivists, researcher data managers and software developers – spent time discussing the issue of obscure file formats and scientific research data. There is the initial problem of identifying the file formats and then the further problem of sustaining the software which supports the data. There are plenty of tools available for file format identification but most rely on the PRONOM file registry,  invaluable but inevitably limited when working with research data file formats. PRONOM supports the work of the UK’s National Archives and whilst it has become the de facto international file format registry standard, its principal raison d’etre is to support UK government departmental record keeping practices.  As a community supporting digital preservation we should be seeking ways to enhance and contribute towards file format id-ing which will enable work above and beyond this. The team at the University of York Borthwick Institute have made great strides in developing and supporting this initiative but it high time a much greater number of us took part in this work. Here at Lancaster University we have over 70 datasets (and counting!) which we are working to preserve and make available for the long term.  A number of these are file formats which we have little or no information about. One of my action points arisng from the workshop is to work on file format identification and documentation – if anyone has any good suggestions of how to start work on this I would be very interested to hear from them!

Sustainability and good practice

We were equally concerned with the long term sustainability of software. I anticipate both migration and emulation to play a role in our digital preservation strategies but having robust software development in the first place is a good starting point. The Software Sustainability Institute does a great deal of unsung work to improve the quality of software development and again we should all be engaged actively in promoting good practice. There is a great deal of useful information and guidance available on their website.
All in all it was a very thought provoking day and one which raised a lot of questions but for me at least gave me some things to put on my “to do” list. Digital preservation is an iterative process and it’s time to bolt another piece onto the digital preservation structure.

 

International_DigitalPreservation

The preservation jigsaw puzzle

University of York Central Hall. Philip Pankhurst, via Wikimedia Commons

I had a great day meeting with Jen Mitcham, who blogs here on her work at the Borthwick Institute and also Laura from University of Sheffield Library to talk about and share experiences of digital preservation.  The needs and set up in our various institutions are different but we share many areas of concern, such as the need for advocacy and the challenges of integrating traditional archival theory and practice with the management of digital data.

Advocacy

We talked a bit about terminology and about how confusing this can get.  “File” and “archive” mean very different things to different people.  It might not just be a question of avoiding confusing terminology but aso have learning a new language to discuss familiar territory.  For archiving read sustainability, for curate read preserve, for file read item and so on.

Old vs new

There are many areas where the management of traditional archives and born digital overlap.  All those involved in preservation – digital or otherwise – need to tackle some basic questions. What have we got? What should we be collecting and preserving? What are we trying to achieve? It is as important for the traditional archivist to have a sense of what to collect, why collect and who is the audience as it is for the digital archivist.

This could be formalised in a policy document or plan but need not be; there still needs to be a clear sense of direction and structure for the work being undertaken.

The need in the first instance to have intellectual control of the collections – whatever they are.  The first question any archivist should ask themselves is “what have I got”? This can be a particularly difficult question to answer if it is obscured by the physical medium of the items themselves – whether they are documents written in a language or hand which is hard to read (it could be Latin, secretary hand or an early version of WordPerfect) or in a format which is inaccessible (water damaged document, reel-to-reel tape, 5 1/4″ floppy disk).  But even though it might be a technically difficult question to answer it’s one which most people can begin to tackle – even if the answer (description) of the data is “medieval document” or “word processed document” or even “research data in a study of termites”. Before making any further progress we have to now what the scale of the problem is.

Pieces in the preservation puzzle

Here at Lancaster University we are focussing on the management and preservation of research data – that is the raw data from scholarly outputs that the university staff and students produce.  This data is undeniably valuable and useful but only if it can be accessed and reused for the long term and be trusted, just like any other evidence (or archival document).  We are considering ways in which we can map research data outputs* which I think will have really big benefits for preservation planning and go some way towards tackling the questions about what we have, what might we be receiving, how long do we need to keep it for.  Because of the huge volume of data we are talking about it makes sense to try and automate as much of the process as possible – something my colleagues at York have been giving some thought to.

And while some of this might seem a little removed from those grappling with legacy files on outdated systems, managing emails, corporate archives or whatever else, they are all pieces in the preservation jigsaw puzzle.

What we should be collecting is a decision for each individual repository but the important thing is that this is clearly (although not necessarily) rigidly defined and that stakeholders (depositors, users, researchers) are consulted and involved.

Crossing the digital divide

What are we trying to achieve is the long-term preservation of archives/data and making them available and discoverable. This is something which presents the big challenge but one for which solutions are continually being developed.  There is not going to be one software tool which will adequately do all of this but then there isn’t one single solution for arranging, describing, indexing, storing, labelling and retrieving traditional archives and there are likely to be a range of solutions which are suitable for one, the other or both.  We are currently inhabiting a world of hybrid digital non-digital archives and we need to be thinking about solutions which cross this perceived divide. These might be old or new but we need to bring together preservation and digital preservation to look at how we manage archives whatever their format.

There’s more about what we mean when we talk about archives here from Kate Theimer.

I have also been following the various Archive conferences in the US, Australia and Ireland.  A nice blog post here from SAA2015 and all the ARA UK and Ireland Conference tweets can be found on Twitter at #ARA2015 including (I think for the first time) a digital preservation strand.  There was a lot in all of these conferences on the subject of advocacy.

*if you are interested in Research Data Management you might want to read more about our JISC funded project here

United we stand 

I attended a Digital Preservation Coalition training event recently in Liverpool called “Making Progress with Digital Preservation”. This came at a good time for me after having been in post as Digital Archivist at Lancaster University for a couple of months and finding myself trying to do just that.  It was a great opportunity to meet some of my fellow professionals in the region and also to meet a wide range of practitioners from different disciplines who had come together to try and get their heads around some of the challenges faced by the emerging and changing discipline of Digital Preservation.

One of the big themes of the day – and something I’ve been giving quite a bit of thought to recently – is the need for advocacy – as William Kilbride, chief executive of the DPC, said “a huge part of digital preservation is relentless advocacy” and certainly the relentless nature of it can seem daunting. I often think that very few people really grasp what it is I am trying to achieve in my job – it can be quite hard to explain – and without having the record creators on board with the task of preserving is impossible. Digital preservation does not take place in isolation – it is a combination of tasks undertaken by a wide range of people taking on the challenges posed by the technologies, information, curation, selection and so on and so on. 

As was discussed at the event, digital preservation is an activity undertaken by people from many different disciplines each of whom bring a different angle or perspective to many of the issues with are being grappled with. This includes librarians, records managers, archivists, data managers, IT systems people, researchers… the List is endless.  It’s a collaborative effort and one which, if it is to succeed, needs to be taken up and be taken seriously by anyone who is engaged in data creation.  And by that of course I mean everybody.

Funding models for projects mean that there are a multiplicity of time-limited projects, the results of which are scattered and difficult to navigate even for someone who knows a little about the subject. On the plus side here are lots of people who are keen to share their knowledge, experience and expertise, and only by string collaborative working will we really achieve results.

I’m preparing to introduce my colleagues to the principles of Digital Preservation because that advocacy work starts at home, and I can’t save the world digital data on my own.

This week I’ve been reading this article by Anthony Cocciolo, Professor of Information at the Pratt Institute, New York and Library Science which looks at the archivist in a data managers world.  I’ve also looked at this article from International Article of a Digital Curation on how we should be taking a holistic approach to data curation.

One small step

I was excited to read today that the National Archives have made their first born-digital records archives available via the Discovery tool.  This marks the beginning of a long process of integrating the traditional and born digital elements of the government’s records into one portal in what (hopefully) will be a seamless point of discovery. This might seem like a small step for many but having spent my first few weeks grappling with the issues around managing, preserving and making available digital data I can only be extremely grateful that there are pioneers there who are paving the way to enable all archives, whether purely digital (as here at Lancaster) or hybrid (as surely all others will eventually become), can look to making their holdings available for anyone who wants to access them.  

The focus on discoverability has been shaping the way I have been thinking over the last week or so and it was good to read Professor Lorna Hughes of the School of Advanced Studies write about the necessity of public access to our heritage as part of the democratic process.  This includes the digitisation of archival and historical resources, the publication of Big Data and the preservation of the born-digital, all activities which as I mentioned in my previous post underpin the democratic process.

For an archivist from a traditional paper-and-parchment (yeah and the rest) background, getting to grips with the way digital data can and should be described so that it can be made available has been a steep learning curve.  I shall be looking at how the National Archives have used Discovery to weave together their hybrid catalogue but I have also been taking a lot of inspiration from the work of Jane Stevenson at Archives Hub and in particular her blogs on the AHRC funded Exploring British Design project. There is much to be gained from thinking above and beyond traditional archival description to help make the digital discoverable for as many people as possible. This fits in well with the emphasis on sustainability and access to research data which is one of the key drivers behind my role here at Lancaster University.

In the meantime my first small steps are going to be around drafting digital preservation policy and reading about what others have already done; by sharing and making available their work to the wider digital preservation and archives community benefits, everyone reaps the benefit.

I’ve been following #LIBER2015 tweets for lots of stuff about copyright, research data management and more….

Happy International Archives Day

I’ve been motivated to write my blog to coincide with International Archives Day with is being celebrated on 9th June with the theme the year of democracy.  The blog is intended to chart my progress in digital preservation which is a new(-ish) direction for me. However as an archivist committed to ensuring authenticity, transparency and access to information it’s one which I see as the logical way of taking this work on into the future and ensuring current and future archives continue to maintain these principles.  In fact it underpins the whole democratic process, and the whole business of democracy cannot exist without archivists and information managers supporting its regulation.

“Secrecy, being an instrument of conspiracy, ought never to be the system of regular government.” Jeremy Bentham, On Publicity from The Works of Jeremy Bentham volume 2, part 2 (1839).
However before these weightier matters can be tackled I need to take my first steps in mapping out a digital preservation strategy and my first task has been to survey what other institutions are doing, what kind of policies they have and any interesting or innovative ways in which digital collections are preserved and presented.  It’s given me a great opportunity to spend some time looking at a variety of collections, some of my favourites being YODAL – the University of York’s Digital Library and New York Public Library‘s digital collections. Whilst I was on the “York” theme (there must be something in the name which promotes good digital projects) I found a wonderful set of digitised images relating to the Spanish Civil War held at New York University and made available via their Digital Library Projects from originals held in the Internaitonal Brigade Archives in Moscow.  Lots of other fascinating stuff here as well including the Guantánamo Lawyers Archive.

In the meantime I’ll be following #IAD15 on Twitter for all the best archives and democracy stories from around the world.