I’ve had a twitter discussion this morning with Pete Sefton (@ptsefton) about what is a repository.
Pete has argued that repositories can be transient as a repository is about content organisation and presentation. I take a different view – repositories are simply content management systems optimised for the long term curation of data.
We are of course both right.
And the reasons why we are both right are because a lot of institutions have used their institutional repositories as a way of showcasing their research outputs. Nothing wrong with that. The institutional repository phenomenon came about because as publications and preprints increasingly became electronic institutions needed a way to manage that, and, for a lot of them, dspace was the answer.
And of course, then we had google scholar indexing them, and various sets of metrics, and the role of institutional repositories sort of shifted.
Enter the data repository. It can contain anything. Performances of the Iliad, aboriginal chants, digitised settler diaries, photographs of old Brisbane, stellar occlusion data, maps, etc. I could go on.
The key point is there’s no unifromity of content type – the content is there for reuse and probably within only a particular knowledge domain. We’re no longer about presenting content, we’re about accessing content.
A subtle distinction, but an important one. Early repositories were oriented towards text based content and that made it easy to conflate presentation with access.
In fact we’re doing different things because access is about reuse and presentation is just that, presentation.
A collection of manuscript images can be presented by a presentation layer such as Omeka to make them available in a structured manner, they can also be stored in a managed store.
In fact the nicest example is the Research management system. Data is pulled from the HR system, the institutional repositories, and some other sources to build a picture of research activity, researcher profile pages, and so on – the same data is reused and presented in multiple ways.
So, let’s call what we used to call the repository the long term curated content management system, the LCCMS.
Besides another incomprehensible acronym, this has some benefits – it recognises that content can be disposed of and may not be fixed – one of our key learning from operating a data repository is that researchers need more than data publication – they need a well managed agnostic work in progress store while they assemble datasets, be it from astronomical instruments or a series of antropology field trips to PNG – something that goes against the idea that only ‘finished’ content goes into the repository, but yet is clearly needed.
So, it’s about content, and more importantly what you do with it …
Pingback: What do we actually mean by data retention? | Building an archive solution