Open source literature and citizen science

As I’ve said elsewhere, it’s been about a year since I retired, and despite having been around research institutions and universities all my adult life, I must say I don’t miss it, except for one thing – easy access to journals.

A surprisingly large amount of information is freely available online, but every so often in my role as a dilletante classicist and amateur Victorian historian one comes up against a blank wall where something that has piqued one’s interest sits securely behind a paywall.

When I was working, this wasn’t a problem – if the institution one worked for had access one just logged in from work, or via a vpn or a proxy server, even if it was nothing to do with the day job. Of course I was in the lucky position where my day job in digital archiving could have allowed me to plausibly claim ‘just testing’ if anyone ever asked but no one ever did.

I no longer have a university login – the first time since 1980 – so I can’t do that. If I still lived in the city, rather than country Victoria, I could get on the bus to my former employer and use the library, but even then I still wouldn’t be able to access the electronic resources as all computers require you to login – no kiosk mode machines for general use or literature searches.

Now I’m not really a serious classicist or nineteenth century historian, so I can get by with secondary sources quite happily, and I’m well enough off to buy second hand copies of any particularly interesting books.

However, having once been a field ecologist, I can see that if I was an amateur botanist, say, being able to access specialist literature would be rather more important than it is to me.

And if we want citizen science, we want it to be good science. And that means access to the literature, which in turn means open source literature, in that the content is freely available.

Now while it’s not exactly cost free, the costs of hosting an electronic journal are minimal compared to the subscription costs, and the costs of simply hosting content are trivial.

And of course it’s not just for the benefit of a few amateur researchers who live in rich countries – researchers in countries where research is poorly funded and which lack the infrastructure of large research libraries ar in an equally difficult position.

Posted in Uncategorized | 1 Comment

Scientific communication in pre 1989 Eastern Europe

A long time ago, the early nineteen eighties in fact, I held a research studentship from the UK Medical Research Council.

And in the course of my research I read widely in physiology and ethology to understand prior work about stress, environmental influences, and learned responses to uncertainty. This was because a lot of the physiological work was carried out on fit young anglo saxon males who were either members of the armed forces or university students, and more importantly must have expected something unpleasant to happen to them.

The point being is that we know that the world is stressful and uncertain and that we (mostly) learn to get on with it, some better than others, it is just that some people do not cope very well, which results in a whole range of psychophysiological symptoms.

And because it was thought unethical to do some of the things that result in extreme stress in humans some people thought it was a little more ethical to use animals. But because of the electromechanical technology the studies were very much either/or and consequently less valuable

Computers controlling things allow for pseudo random events where you can go from ‘mostly predictable with the odd bad thing’ to absolutely random and terrifying. Needless to say humans and animals find the latter scenario very stressful.

As it was important to avoid repeating prior work because (a) human based studies were expensive and involved even then complex approval processes and (b)  and work with non human subjects was heavily restricted, one had to be sure that no one had tried to do something similar before.

So one read. And because the university where I was studying didn’t have many of the journals I needed I was given a generous inter library loan allowance.

And in they would come, photocopies from from the British Library’s document supply centre. Mostly British or North American, but occasionally French or German. And one time in a German paper, I found a reference to an interesting study carried out at the Charles University in Prague, which of course in those cold war times was the grim grey capital of the Czechoslovak Socialist Republic, and not a funky place with good music and better pubs.

But for the hell of it I put in a request, expecting either a rejection slip – request disallowed – or possibly a photocopy.

But no, someone at the British Library thought it worthwhile to ask the Czechs for a copy and they sent me a copy of conference proceedings that included a copy of the study I wanted to xerox it myself.

Now  at that time, journals in the west were comparatively cheap, produced by learned societies in the main, though there were a few commercial journals owned by Pergamon, Elsevier and Springer and doubtless a few others I’ve forgotten. And of course there were things like the science citation index, which is arguably the granddaddy of bibliometrics and the various reputational studies that plague us today.

But in the old east there was no such structure. Knowledge was said to be the property of the people as the people had paid for it. There were almost no journals, certainly no commercial journals, yet people still discussed and exchanged ideas and built reputations.

Now one of the concerns among researchers  about moving to publication in lesser known open source journals is the loss of impact and reputation, and consequently the ability to attract funding, so my question is, does the way scientific communication proceeded in pre 1989 eastern Europe give us a model for a world with diverse methods of publication and dissemination?

Posted in Uncategorized | Leave a comment

What is a repository?

I’ve had a twitter discussion this morning with Pete Sefton (@ptsefton) about what is a repository.

Pete has argued that repositories can be transient as a repository is about content organisation and presentation. I take a different view – repositories are simply content management systems optimised for the long term curation of data.

We are of course both right.

And the reasons why we are both right are because a lot of institutions have used their institutional repositories as a way of showcasing their research outputs. Nothing wrong with that. The institutional repository phenomenon came about because as publications and preprints increasingly became electronic institutions needed a way to manage that, and, for a lot of them, dspace was the answer.

And of course, then we had google scholar indexing them, and various sets of metrics, and the role of institutional repositories sort of shifted.

Enter the data repository. It can contain anything. Performances of the Iliad, aboriginal chants, digitised settler diaries, photographs of old Brisbane, stellar occlusion data, maps, etc. I could go on.

The key point is there’s no unifromity of content type – the content is there for reuse and probably within only a particular knowledge domain. We’re no longer about presenting content, we’re about accessing content.

A subtle distinction, but an important one. Early repositories were oriented towards text based content and that made it easy to conflate presentation with access.

In fact we’re doing different things because access is about reuse and presentation is just that, presentation.

A collection of manuscript images can be presented by a presentation layer such as Omeka to make them available in a structured manner, they can also be stored in a managed store.

In fact the nicest example is the Research management system. Data is pulled from the HR system, the institutional repositories, and some other sources to build a picture of research activity, researcher profile pages, and so on – the same data is reused and presented in multiple ways.

So, let’s call what we used to call the repository the long term curated content management system, the LCCMS.

Besides another incomprehensible acronym, this has some benefits – it recognises that content can be disposed of and may not be fixed – one of our key learning from operating a data repository is that researchers need more than data publication – they need a well managed agnostic work in progress store while they assemble datasets, be it from astronomical instruments or a series of antropology field trips to PNG – something that goes against the idea that only ‘finished’ content goes into the repository, but yet is clearly needed.

So, it’s about content, and more importantly what you do with it …

Posted in Uncategorized | Leave a comment

Lodlam 2015

I’ve just spent the last two days at the Lodlam summit in Sydney.

Lodlam – Linked Open Data in Libraries, Archives and Museums – was an invitation only event loosely linked to the Digital Humanities 2015 conference also on in Sydney at the same time and I was lucky enough to get an invitation to the LodLam event.

It was interesting involving and certainly provided a lot of food for thought. Rather than repeat myself endlessly follow the links below for (a) a personal view of the event and (b) my session notes, suitably spell checked and cleaned up. As always the views and interpretation are mine, and not those of any other named individual (although for a linked data event I guess I should write ‘named entity’ )


 

Posted in Uncategorized | Leave a comment

Electronic resources and BibTeX

BibTeX is many things to many people, but we principally use it as a bibliographic file format.

This of course produces a whole slew of problems when it comes to online resources, for the simple reason that BibTex predates online resources.

Traditional paper media have a whole set of conventions we understand about the differences between a book, a book chapter, a paper, and a conference paper, all of which BibTeX handles well by using @article, @book and so on.

Strict BibTex only really has the @misc format to incorporate url’s and so on but that works quite nicely, as we can see with using BibTeX for dataset citation, the only problem being that if everything is @misc we lose the distinction between articles, datasets and conference presentations and so on. Using the newer BibLaTeX standard allows UTF-8 characters, but really does not add anything – you end up with a generic @online type instead of simply coercing @misc to do your bidding.

This is not just a BibTeX thing – all reference managers in common use are still firmly bedded in the paper era – Endnote has similar problems distinguishing between various sorts of electronic resources.

There is another problem with online articles.

In the old days your research paper was published in only one place, and consequently had only one incarnation, and hence one citation.

With open access material you may find the pdf of the journal article in a variety of locations, the journal publisher’s site, your institutional repository, or some specialist collection, in other words you can have multiple instances of the same document, and each instance will have a different url.

This of course would play havoc with citation counts. The simplest solution is to implement a rule that the copy published in the open access journal is the primary one and that secondary copies are just that, and consequently the url cited is the one derived from the digital object identifier, and not the one generated by the local document server – logically the copy in your local repository is the analogue of the xerox of the journal copy you got from your local library’s document supply service.

So, in the case of a dataset, or a conference paper or something only published locally the doi should resolve to the local instance, but where it’s published elsewhere it should resolve to the journal doi …

Posted in Uncategorized | Leave a comment

Exporting references from Dspace in BibTeX format

Following on from our design decision to use BibTeX as a lowest common denominator reference export format, we have developed a simple BibTeX reference export utility for Dspace 4.3.

Essentially, it simply takes the Dublin Core object description and translates it to a BibTeX style reference with the object type, for example @article for a research paper being set on the basis of the dc.type metadata field.

As a further refinement we are using the object handle as the label which would give us an entry that looks something like this:

@article{hdl.handle.net_1234_1234567,
author = {Collins, Wilkie},
title = {Testing Methodologies},
journal = {The Journal of Important Things},
year = {2014}
}

Testing and development is ongoing, but our test sparse entries import successfully into Zotero and JabRef

Posted in Uncategorized | Leave a comment

Our candidate programmatic Orcid updater

Back in March 2014 we made our prototype application for programmatic Orcid updates available. This was designed only as a prototype and not intended for general use in a production environment.

As of 01 April 2014 Orcid are going to move to release 1.2 of their schema, which may break our app.

When we say may we really mean may, we don’t know as we havn’t tested it against the new version of the Orcid schema – just now we’re working on some other projects and don’t have the bandwidth to test things properly, and more importantly, update them if they break.

However, we will be testing and if necessary updating our tool later this year – we just can’t say when …

Posted in Uncategorized | Leave a comment