Memory Makers 2018

Beautiful Amsterdam

I was extremely lucky to be at the Amsterdam Museum for both the Memory Makers: Digital Skills and How to Get Them conference and also the Digital Preservation Awards 2018 , where excellent practice across the sector was recognised and rewarded.

I missed the ePADD workshop in the morning but I did get to meet Josh Schneider later – which was great – so I need to make sure I follow up on my email preservation work so I can bother him with more questions in the future.  That’s one of the really great things about conferences – you get to meet people whose work you have followed and admired – this helps create connections, establishes areas of interest and builds communities. 

The conference was kicked off with an inspiring but impossible to summarise keynote from Eppo van Nissen tot Sevenaer, Director of the Netherlands Institute for Sound and Vision (Beeld en Geluid) in which he encouraged us all to be archivist activists and quoted William Ernest Henley’s poem “Invictus“.  It fired us all up for a conference which explored how digital preservation knowledge is taught, acquired and disseminated.

The first session focussed on teaching “Digital Preservation” (there was quite a bit of discussion about what constituted this and how it was best described in terms of curricula). Eef Masson of the University of Amsterdam who teaches on a Masters Programme on Preservation and Presentation of the Moving Image discussed how the disciplines of film and media studies intersected with and led to collaboration with the traditional archives programmes – to everyone’s benefit. Sarah Higgins from University of Aberystwyth talked frankly about the difficulties of engaging students from humanities backgrounds with digital skills.  Many (although by no mean all) people choosing a career in archives do so because they like “old things” – this struck a chord with to me as I am a medievalist by training and I have learned to “love the bits”.  How did I get there and how can I take others there with me?  It seems there is a need to engage and inspire people with our less tangible digital heritage.  Later that evening on receiving her DPC Fellowship award Barbara Sierman said:

One of my big take aways from the conference was how to engage people with digital preservation and encourage people to get as excited about it as I am!  After Sarah Higgins, Simon Tanner rounded off the session talking us through Kings College’s Digital Asset and Media Management MA which boomed in numbers once they added the word “Media” into the title.  The list of MA student dissertation topics sounded absolutely fascinating and very varied. Tanner explained that they don’t teach Digital Preservation as a module but rather it is woven into the fabric of the degree.

Sharon McMeekin of the Digital Preservation Coalition began the second session of the afternoon by talking through the survey of what kind of training members said they wanted (which might not necessarily be the same as what they ought to be focussing on…).  She encouraged sharing best and worst (!) practice and emphasised that Digital Preservation is a career of continuous learning – something to be aware of when employing someone in that role.  Next was Maureen Pennock of the British Library who illustrated an enviable internal advocacy strategy. She explained:

The final speaker of the day – Chantal Keijsper of the Utrecht Archives – described the “Twenty First Century” skills and competencies needed to realise our digital ambitions.

The evening was taken up with the Digital Preservation Award 2018 which you can read about here. They were all worthy winners and there were many extremely unlucky losers.  Almost all of my nominees won their category – I’m saying nothing beyond re-iterating my love for ePADD – they were very worthy winners in their category!

Jen Mitcham of the DPC and me at the awards ceremony

Day two of the conference was a chance for some of the Award finalists to showcase their work.  First up was Amber Cushing from University College Dublin discussing the research done to try and build the digital information management course at the institution. In a targeted questionnaire aimed at those who had responsibility for digital curation there was a surprising lack of awareness of what digital preservation/curation was and a confusion between digital preservation and digitisation. Next up was Rosemary Lynch who was part of the Universities of Liverpool and Ibadan (Nigeria) project to review their Digital Curation curriculum. Both institutions learnt a lot from the process and enabled them to make changes to their student offer. With support from the  International Council on Archives this project has helped make standard and other resources available in countries where there this can be difficult.  Next was Frans Neggers from the Dutch Digital Heritage Network (Netwerk Digitaal Erfgoed) talking about the Leren Preserveren course launched in October 2017 enabling Dutch students to learn practical digital preservation skills.  They have had excellent feedback from the course: 

I expected that I would learn about digital preservation, but I learned a lot about my own organization, too”

Student on the Leren Preserveren project

and Neggers added that another benefit was raising the profile of the Dutch Digital Heritage Network – often this course was how people got to find out about the organisation.  The final speaker in this session was Dorothy Waugh from Emory University, one of a group of archivists who have developed the Archivist’s Guide to Kryoflux.  I can testify that this is an invaluable piece of work for anyone planning to (in my case) or actually using a Kryoflux device (designed to read obsolete digital media carriers).  The Kryoflux was developed by audio visual specialists and does not come with archivist-friendly instructions:

In the final session we heard some great examples of training and advocacy.  Jasper Snoeren from the Netherlands Institute of Sound and Vision (Beeld en Geluid) talked about their “Knowledge Cafes” where they invite staff to share a drink and learn about curation and preservation. He discussed how to turn a sector into a community: run very focussed training programmes and keep people engaged in between. Puck Huitsing from the Dutch Network of War Collections (Netwerk Oorlogsbronnen)follwed and had a great deal of useful advice which would constitute a blog post in itself although my favourite quote was probably:

Rounding off an extremely useful and successful event, Valerie Jones from the UK National Archives presented the Archives Unlocked Strategic Vision for the sector, tempering this by saying:

If you’re going to innovate, just do it. Don’t write reports. Just go.

Valerie Jones, UK National Archives

I learnt a great deal at this conference and as usual I have added more to my “to do” list, especially around tackling internal advocacy and I can’t wait to start putting this into practice.

Stroopwaffels and coffee
Stroopwaffels and coffee kept me going!

World Digital Preservation Day 2018


When people ask me what I spend my time doing I generally say “digital stuff” or (when I’m being more frivolous) “staring at spreadsheets”. Sometimes I explain a bit further and say – “I try and ensure that old files and digital content will still be usable long into the future” and actually most people think about this and say – gosh – how do solve that problem?

Well the answer is, of course, that it isn’t something you “solve” any more than you would “solve” the problem of looking after archives, manuscripts or physical artefacts.

So on World Digital Preservation Day when we are all thinking about Digital Preservation – what are the challenges which a Digital Archivist faces and what approaches she might be consider taking.

The creators

I actually spend a good deal more of my time than you might imagine talking to people, emailing them and creating guidance and advice.  If you are lucky – as I am – you have people in your organisation who are working with you who can spread the word about the importance of records and data management (and how this hugely enables the preservation process) and try and get the message out to the people who are creating the digital stuff in the first place. This part of the work is extremely time consuming;  capturing the right data or documents in the best formats with the appropriate metadata and contextual information can involve a great deal of to-ing and fro-ing.  It’s really useful to be able to find out how people work and manage their files.  There is no sense writing guidelines for people to follow if it doesn’t reflect their own real life situation.  It just means that people won’t follow them.

Having said that – especially for World Digital Preservation Day – I have updated and promoted some general guidelines for our potential and actual depositors which you can see here. Have a look and tell me what you think! I’m going to be using it to promote to internal and external depositor in the coming year.

The knotty problems

Actually not necessarily one of my knotty problems…

One of the biggest challenges of Digital Preservation – without doubt – is capturing the “stuff”* in the first place. This sounds so obvious but is worth repeating: you can’t save what you don’t have.

I’ve been working with our Records Management team looking at the records which the University produces – minutes of the Council and Senate and other committees; files which are vital for the running of the institution and need to be retained for both legal and historical reasons.

The files contain sensitive (for business and personal data reasons) so we a secure method of transfer which ruled out email (too easy to mis-send – in fact this is the largest cause of inadvertent data breaches). Similarly using a carrier format like a USB drive or a hard drive was potentially insecure and also very clunky and adding in an extra step to the deposit process.  The harder it is to transfer the data the less likely people are to do it, as any systems designer knows…  We turned to our institution’s own file sharing platform which satisfied our information security team and was already familiar to the people creating the files.  Perfect – I thought – especially as we get a notification email once a file has been deposited.  However – and there’s always a however – it appears that uploading the files to the file share changes the last modified date.  Even worse – it appears that it sometimes changes the last modified date. And sometimes it doesn’t.  So what I thought was going to be a straightforward solution turned about to be more of a knotty problem.  Added into the mix is also the question of “how much does it matter if the last modified date is not correct”? We all know they are very unreliable and in this context I anticipate that most of the material will be deposited very soon after creation, so the dates will not be wildly inaccurate. Something for me to work on…

Back to the future

Much of the work I do is working with internal depositors, IT services, Records Management and increasingly with external depositors. However we still have plenty of work to do stitching together the work I do around digital curation and the traditional archives elements of the service.  We have made a start examining our cataloguing practices for how fit they are for the future. There has been a lot of good work done already (for example by the University of California) on guidelines for cataloguing born digital and hybrid collections but it’s still very much a developing field.  The impact and uptake of Records in Context is still unknown and we are not starting from scratch so the approach we are taking is to build on the cataloguing practices which have already been developed over many years here.  At present we are only ingesting small quantities of digital material so we can craft our cataloguing practices to meet this need, but there is going to have to be some radical rethinking to meet large-scale deposits which will start to come soon enough.  This will affect how we use our collections management tools and how we present the catalogue to the public.  I can see some data modelling work taking place to help map relationships between versions and iterations (let me add that to the to do list below).  All this is very much based in traditional archival principles and is going to involve getting everyone in the team onboard.

The community

The Digital Preservation Community is relatively small and disparate but I could not do my job without you (yes that’s all of you out there with an interest in digital preservation!). Whether it is sparking ideas, offering advice and guidance, sharing best and worst practice (Digital Preservationists Anonymous – yes it’s a thing) or just being a sounding board the contribution of countless people out there in helping me is a fantastic thing. And a big shout out to the Digital Preservation Coalition whose invaluable work in enabling and bringing people together does a huge amount to support the community worldwide.  I hope I do my best by sharing my joys and frustrations via Tweets

and my blog.

So much to do

I’ve got an ever increasing list of things I want to do which includes:

  • set up and test ePADD. You can read about my false start here but I’ve now got more RAM so there’s nothing holding me back apart from finding the time
  • undertaking a survey of the datasets in our institutional repository WRAP to see what file formats we are dealing with.  I’ll be interested to see how the results compare to what I found when I was at Lancaster University.
  • get a forensic workstation up and running. It’s on order but I need it in front of me to be any use!
  • revisit my digital asset register – I created this in my first couple of months but I need to return to it to look more carefully and match it to our catalogue and also to some storage management work that I’d like to undertake


So today is World Digital Preservation Day and I will be in Amsterdam celebrating the achievements of many in the community at the Digital Preservation Awards Ceremony.  There are some brilliant nominations for some of the great work done in the last year or so and you can watch it streamed live here.  Meanwhile I need to get on with my to do list and maybe start on a project worthy of a future nomination?

*stuff: technical term for the stuff made of bits and bytes… If you come up with a better term please let me know….

Archivematica UK User Group Meeting November 2018

Modern Records Centre, University of Warwick (Image: Modern Records Centre)

Yesterday it was a privilege for us to host the autumn Archivematica User group meeting here at the University of Warwick – the 9th User Group meeting since its inception in 2015. I wasn’t at that meeting but I have been to all of the rest and they are a great opportunity for people who are interesting in, experimenting with or using Archivematica in full production mode to get together and discuss their experiences.

I used host’s privilege kicking off proceedings by giving a brief introduction to where we are at the University of Warwick which I illustrated like this:

Ain’t no mountain high enough (Image: Pixabay)

It does sometimes feel like we have a mountain to climb. We have various issues with our installation of Archivematica to sort out and then when we’ve got those sorted it it’s on to the really tough stuff!  We know that the future of digital archive processes is going to be about dealing with large quantities of material so we need to work on automating as many of our processes as possible.  A good place to start on this in is automating capturing descriptive metadata and also as many of the ingest processes as possible.  There are so many questions and I hope to be able to report on our progress at future meetings.

Next up we heard from Jenny Mitcham of the University of York at her final Archivematica UK User group meeting before she moves on to pastures new at the Digital Preservation Coalition. Jenny was reporting back on her work on some old WordStar files which form part of the Marks and Gran archive. She has already blogged about her adventures with these files and she came to the meeting to report on her most recent work using the manual normalization function in Archivematica.  Jenny emphasised that this work is incredibly time consuming and requires lots of experimentation and QA.  The work involved testing migrating the files to different formats – PDF, TXT and DOCX. By comparing the migrated results to an original version of WordStar which Jenny has running on an old machine in the corner of her office she could see that each normalised format captured some of the properties of the original but none of them captured them all. There was a further complication in that some files had the same name (but with a different extensions) which Archivematica does not like. On top of all of this PREMIS metadata has to be added manually to record the event – this gives not entirely satisfactory results in terms of the information that it represents (or doesn’t represent). The whole normalisation process is long and complex and is summarised with a short and not entirely decipherable PREMIS entry.  Jenny’s main take away is that Archivematica struggled in situations where the normalisation path was unclear.  Any three of the normalised files she produced could be an AIP or a DIP but Archivematica does not allow for more than one of each.

Following these presentations the group had a short discussion on the appraisal tab feature in Archivematica. We had previously asked people to test it out to report back on whether they thought it was a feature they were likely to use or not.  We had a relatively small number of people saying they had tried using it and possibly this reflects difficulty of use and/or the fact that the feature was designed for use specifically with ArchivesSpace (and therefore doesn’t necessarily integrate with AtoM or other systems). There also followed some discussion of how much appraisal people were likely to do in Archivematica (as opposed to before ingesting into the system). There was some feeling that it might be more useful if it did integrate with AtoM but this of course would require development work.  Food for thought…

There was also a short discussion on “how much” IT support we felt an institution might expect to need to support running an instance of Archivematica. Admittedly this is a bit of a “how long is a piece of string?” question but there were some valuable contributions around how advocacy was needed to engage IT support colleagues which might lead to more of a feeling of ownership and help develop enthusiasm and experience (they go hand in hand). There was also discussion of costings and creating a business case (the Digital Preservation Coalition got a name check here).

String: how long is it? (Image: Pixabay)

After lunch we heard from Hrafn Malmquist from the University of Edinburgh who was updating us on his work automating their Archivematica workflow.  We heard at the last meeting about the beginnings of this piece of work creating an integration between Archivematica and DSpace and ArchivesSpace.  I was extremely impressed by the way in which the SIP is processed and produces two AIPs, one of which goes through to a dark archive and the other to DSpace.  The DIP which is produced is then also pushed to DSpace which then creates a link to ArchiveSpace.  So far just getting all the storage and access locations working smoothly is impressive enough but Hrafn says there is more to do – for example the DIP file structure is flat where it should be more hierarchical.

Matthew Addis suggested this is how we felt about dabbling with the FPR… (Image: Jeff Eaton: CC BY-SA 2.0)


Next up was Matthew Addis talking about his “journey into the FPR”. For many Archivematica Users (at least for those of us who discussed this at the Winter 2017 meeting in Aberystwyth), the Format Policy Registry is a thing to be approached with extreme caution. Archivematica offers the user the option of customising the normalisation pathways although as we saw with Jenny’s presentation approaches to normalisation are extremely complicated and often require a decision making process based on “least worst” options. Matthew’s normalisation work was around Office documents and emails.  One example was creating a normalisation path for Powerpoint files to PDF(A) where the process is lossy as animations, fonts, comments and all sorts of other content is lost.  Normalising to an Open Document Format might be preferable but this format is not widely supported making the files relatively inaccessible.  Analysing files for significant properties is extremely complicated and time consuming and in the end not easy to quantify; how do you measure which particular property has “more” significance than another if you are trying to compare processes. Another challenge was that Archivematica only supports one input format, one tool and one output format and sometimes more than one format and more than one tool might be involved in normalising a file. It was good to be reminded just how complex office documents are and cause no end of a headache for anyone planning for future resilience.

Our final presentation for the day was from Alan Morrison from the University of Strathclyde. He took us through their Research Data Management workflows using Archivematica. They share an instance of Archivematica with their Archives and Special Collections but there is little overlap between the two services. At present Archivematica is used just to create AIPs which are then stored in the local network storage. DIPs are not created because Strathclyde use the front end of their institutional Research Information System (the database which manages all the research outputs) to make the datasets discoverable.  Alan recognised that there ongoing issues, not least poor interoperability between systems and too many manual actions which lead to human error.  But there was much to look forward to as well such as a possible development of dashboard monitoring to aid management of the AIPs and the development of a plug-in to integrate with an ePrints repository. He also mentioned a possible Scottish Archivematica Hub (given there are a number of Scottish institutions using Archivematica).  We’ll definitely be looking forward to hearing more about this in the future.

To wrap up the day we were delighted to hear from Kelly Stewart of Artefactual systems making an early start in Vancouver to give us an update on Archivematica developments at their end.  We’re looking forward to the release of Version 8.0 which is imminent and excited to hear about a possible Archivematicamp UK/Europe – are there any interested hosts out there?

Overall I really enjoyed the day – there was a lot to think about and I gave myself a couple of pieces of homework which I must get on with sooner or later.

If you are interested in Archivematica and would like to join the group or just attend a future meeting to be able to chat to fellow users then do get in touch with me rachel_dot_macgregor_at_warwick_dot_ac_dot_uk

It takes a while to mature


I wrote in my last post about how I was looking for more resource so I can make progress on various outstanding preservation tasks. This is not a speedy process so in the meantime I am looking at ways to help in the search for more resource and also the ways in which I should be deploying the resources I do have. It seems like a good time to write a roadmap which will hopefully help articulate the vision of where we are headed, identify concrete objectives and priorities to help others understand the work we are trying to do.

First of all I would like to undertake some sort of audit of where we are as an organisation. I have long been an advocate of the NDSA Levels of Digital Preservation and if you have met me you have probably heard me banging on about them. I even have them pinned up next to my desk (I stole this idea from Jen Mitcham) alongside my favourite xkcd cartoon

My desk

These are a great starting point but I’m looking for something a bit more in-depth.  This  is where I’ve turned to Maturity Modelling which is a method of assessing where an organisation is at in different areas and scoring to help define where improvements could be made and highlight areas which need the most attention. To help me undertake this assessment I looked at the suggestions on the Digital Preservation Coalition Preservation Handbook and also turned to Twitter, not least because that’s a place where many of those who have developed these models are to be found.



The Digital Preservation Capability Maturity Model referred to above is definitely one I am interested in and can be found here. The Assessing Organisational Readiness toolkit proved harder to track down (as the twitter conversation suggested there was a link rot issue) but I managed to get hold of a pdf version with another call out to Twitter (it would be great if there was some way of hosting it somewhere…).  The AOR toolkit is also very useful; based on the 2009 Jisc AIDA toolkit (also hard to find) and the CARDIO Research Data Assessment. This is also helpful as Warwick’s Research Data team have been developing their own roadmap using CARDIO and we are obviously keen to develop our services in a joined up and collaborative way.  The third suggestion which I’m going to look closely at is the Kenney and McGovern “Five Stages of Digital Preservation ( which was not hard to track down and has its own DOI, giving at least some guarantee that the link rot will be less likely.

I’ve started going through these models and each has different things to offer which are more or less useful to my particular situation. Every institution has its own priorities and ways of working and there is no one approach to digital preservation which will be applicable across the board. The roadmap I want to develop will hopefully help in the following areas:

  • establishing my digital preservation priorities
  • working out how to develop and move forward with preservation activities
  • highlighting areas for collaboration within the organisation
  • raising the profile of digital preservation work within the organisation
  • help make the case for additional resources based on an analysis of our current position

Using my assessment tools I can then identify my stakeholders and work towards a better understanding of where we are as an organisation and how we move forward.

So for now it’s back to my beloved spreadsheets and time to do some scoring!

You’ve got mail

Email preservation: it ain’t easy.  I knew that before I even started looking at it but there were a number of factors which prompted me to begin having a look at what we could do here at MRC.  I have been spending quite a bit of time getting to know the collections here and get a feel for what sorts of digital material we have.  This is normally the very first step in undertaking digital preservation work.  Understanding the collections means we can then prioritise and target particularly vulnerable formats, and make plans to tackle formats which will cause particular issues (eg 5.25 inch disks, 3.5 inch disks and so on). I have not completed this process yet (there are several thousand accessions to pick through) but I have turned up some emails included in the collections in a variety of formats: printed out, copied and pasted into word processed documents and so on.  We need to be preparing ourselves to deal with a new deposit of material which we might be offered from a trades union, activist or one of the many other people or organisation which come under our collecting policy, as it is almost certain to include email. The fact that it hasn’t yet become a huge issue is most likely because we haven’t yet asked the question.


I am also starting to think about how we collect archives from the university itself more effectively and inevitably this includes email correspondence in a number of different settings.  I probably wouldn’t be able to even consider this were it not for our Records Manager who is working on the front line of records creation. I don’t think I can emphasise enough the fact that it doesn’t matter how many all-singing all-dancing technological solutions we put in place to “preserve” the digital stuff, if we don’t have actual have people and resources in place with record creators (in whatever capacity this might be) we can’t hope to capture the things that are really important (whether for cultural, legal, evidential or other reasons).  Preservation: it’s all about resourcing.

Whilst we may be a little way away from tackling some of the complexities of emails as archive collections, a more pressing use case for us is to preserve the correspondence which accompanies our collections and include it in the submission information of our SIPs ingested into Archivematica. So I have been looking at some tools which would be useful in this process and thinking about what they can do and how they might fit into a transfer/deposit/appraisal/ingest workflow.

ePADD, which is developed and maintained by Stanford University, describes itself as “the all-in-one email appraisal, processing, discovery, and delivery solution for donors, archival repositories, and researchers.”

It’s worth noting here that it does not mention preservation – that isn’t actually what ePADD does – although it’s often mentioned in the same breath as various other preservation tools.  ePaDD is designed to help with the acquisition, appraisal and management of email collections, so particularly around capture and content management.  I would argue of course that this is all part of the preservation process but it’s fair to point this out as other tools will be needed for the processing and preparation of emails.  I have also been looking at Emailchemy and Aid4Mail both of which help with converting email export packages into preservation formats.

ePADD comes with excellent and very clear instructions on downloading and using the software and there is a detailed and active community forum. ePADD can link directly to a mailbox where you can select folders to capture emails from or you can upload emails which you have exported from a system in an MBOX format.  This is the standard export file format used by many email clients but NOT (inevitably) Outlook, which uses the proprietary .pst format.  This is where programs such as Emailchemy or Aid4Mail come in – they both enable the user to convert .pst files to MBOX format.

I got ePADD installed on my pc but immediately ran into problems…


The good news is that Josh Schneider at Stanford was extremely quick off the mark with a diagnosis – that my pc did not have enough RAM.  He suggested running the Java version from the command line where you can specify how much RAM to allocate to the program.  I’m not a very techy person so although this sounded a bit daunting, again the excellent instructions for ePADD meant I could do this.


Oh well.  At least it confirmed Josh’s diagnosis. And gave me a next step, which is to get more RAM behind me and hopefully get testing ePADD properly.

I was hoping this post would be documenting my adventures with email acquisition and appraisal but I’ll have to leave that for another day.  It came at a good time however as ePADD has been nominated for a Digital Preservation Coalition SSI Award for Research and Innovation. Based just on my experiences so far my vote is definitely going to ePADD as the documentation and support have been excellent and it looks as if it’s going to be a product with a lot of possibilities for us.

In the meantime – as with so much in digital preservation – I’m just going to have to look for more resources.




New beginnings


It’s exciting times for me starting a new chapter at the Modern Records Centre, University of Warwick – a great opportunity for me to get my teeth stuck into a a whole new set of challenges.  I’m really going to miss the team at Lancaster University where I started out on my preservation journey but I’m looking forward to hearing about everything they get up to in the future.

Meanwhile I’m just completing my second week here at the Modern Records Centre (MRC) where great work has already been done on implementing a live instance of Archivematica and ingesting both digitised and born digital material.  I’m really lucky because my predecessor left tons of useful notes and guidance for me to pick up.  I’ve got the task of moving things forward to start to try and scale up the processes so that we move from the current manual upload and processing of content to a more automated scalable approach.  This will help us tackle the backlog of digitised material which is awaiting ingest and also deal with the born digital material we are beginning to receive.


So my main focus in the first couple of weeks is around understanding the workflows and processes that take place already here at MRC with the creation of digital content and the cataloguing of both digitised and born digital materials.  It’s made me start to think (again) long and hard about the best approaches to cataloguing born digital materials.  I have returned again to the very excellent National Archives’ Digital Cataloguing Practices paper and also the University of California’s Guidelines for Born Digital Archival Description.  Any other suggestions very welcome to help get my thinking going! We are dealing largely with hybrid collections so all approaches need to take into consideration the legacy and current cataloguing methods. There is no clean slate or break – any developments need to work for the users and the archivists currently managing the physical collections. It’s going to be a collaborative effort and I’m looking forward to the challenge.

I’m also pushing a few digital assets through Archivematica and it is *very slow*. I’m hoping to concentrate on improving performance particularly with a view to scaling up. As ever, there are those who have gone before, especially some fantastically useful blogs from Bentley Historical Library and Jenny Mitcham at the University of York not to mention the Archivematica User Forum.  In fact I’m writing this whilst waiting for a SIP to appear in my ingest tab (I hope I haven’t broken it already…).

Not the actual cake I baked – that got eaten…

So it’s busy busy busy here although I have had time to bake a cake which hopefully has started me out on the right footing with colleagues.

Now – off to track down that missing SIP….

Software Matters

Me thinking about software preservation

I’m going to let you into a secret.  Well, it’s not a secret really but whilst I have been gamely (trying to) take on the challenges of tackling various technical aspects of digital preservation, my approach to software preservation has been decidedly that of an ostrich.  I have been firmly sticking my head in the sand over this! It’s partly because I really don’t feel like I know enough about software writing in the first place and partly because someone else is doing something about it aren’t they…?

However databases, websites, video and so on are complex digital assets which I am only too happy* to tackle but somehow software seems a step further

However a couple of things made me rethink my position.

The first was a recent talk from Neil Chue-Hong of the Software Sustainability Institute at one of our Lancaster Data Conversations.  He discouraged me and encouraged me in equal measure.  Discouraged me because when addressing a room full of coder writers he asked them to consider how they might access their code in three months time.  Three months!  What about three years or even three decades?  If people are not even considering a lifetime beyond three months I’m starting to wonder if it’s worth getting involved at all.

However on a more positive note Neil was keen to promote good practice in software writing and management and recognised the barriers to maintaining and sharing code.  As with most preservation work the key is getting the creators to adopt good practice early on in the process – the upstream approach which I’ve alluded to before and has been around a very long time and indeed is what makes Digital Preservation a human project.  However in order to support good practice and to build the right processes for managing software a better understanding is what is required.

Can we build code to last?

The second thing was the realisation that I was already responsible for code in our data repository – for example this dataset here which supports a recent Lancaster PhD thesis.

We don’t have a huge uptake from our PhD students for depositing data in the repository and we are especially keen to encourage this because we want our researchers to get into good habits early.  Neil explained in his session that there were particular barriers to certain researchers sharing data – early career researchers amongst them – as there is a fear of sharing “bad” code.  But as he pointed out – everyone writes bad code and part of the advocacy around sharing is getting over the fear of “being found out”.  From my perspective – if people are willing (or even brave enough) to share their code I want to make sure that as someone charged with digital preservation I can try and create the optimal environment for software preservation into the long term.

At the moment I think we have some way to go on this, but thankfully help is at hand with the Jisc/SSI workshop Software Deposit and Preservation Policy and Planning Workshop.  The workshop aim is to “first workshop will present the results of work done by the SSI to examine the current workflows used to preserve software as a research object as part of research data management procedures.”

Sounds good to me.

What do I hope to get out of this?


  • I’m really interested in getting the right metadata to support the preservation of software
  • I’m keen to hear what other people are doing!
  • I want to know where I can best direct my efforts and the areas I need to concentrate on first to get up to speed with providing excellent support

I’ll be reporting back soon on what my next steps are…

*or is that grimly determined?