As part of our work to develop a solution to better manage metadata about researchers, publications and data, we set off down the road of trying to link publications to datasets, and by extension grey literature, deposited in various online collections as well as linking to information on grants held to try and build a picture of research activity.
This is useful in a lot of ways – for example it potentially allows us to implement a service to automatically generate online profile or portfolio pages and highlight a researcher’s most cited papers, or who has had the most international collaborations, and so on.
One problem that we immediately hit was that of disambiguation – being able to accurately identify that the Fred Smith who deposited a set of photographs of Wari shroud wrappings was the same person as the Fredrick L Smith who published a monograph on the iconography of early Wari shroud wrappings.
Internally this is not really a problem as we have a universal identifier scheme in place, but it does tend to break down when tracking collaborators at other institutions. One could imagine a scenario where colleagues elsewhere publish an analysis of data held in our data repository, and vice versa. This is particularly a problem where you have a number of large scale international collaborations.
Granting external people temporary identifiers is not a viable long term answer as people change institution, change their name, change their email address, meaning that over time the information held may become invalid.
The obvious answer would be to use some form of persistent universal global identifier, and one in which either the owner of the identifier, or the provider of the identifier, is invested in keeping up to date.
In fact there are already a number of global identifiers in use, many of which are proprietary. Examples of possible global identifiers include scopus identifiers, ResearcherID, NLA/Trove identifiers, Google Scholar profiles and the like.
However, none of these identifiers are truly universal, as not everyone has the same set of identifiers, and some people, such as early career researchers, may have none at all.
Our approach was to build a light weight database keyed to the institutional identifier to record these attributes for each researcher.
At the same time ORCID was clearly an identifier gaining widespread adoption worldwide and not tied to any field of study, or scholarly information service, making it suitable for use as a universal career lifetime persistent identifier.
As part of feasibility testing we decided to develop an ORCID minting tool, to demonstrate that we could programatically take the information we already knew about a researcher and either create an ORCID id for them from existing data sources or update their existing ORCID record.
The solution has been tested against the ORCID sandbox. This is very much a proof of concept exercise but we believe that the code is sufficiently robust a that it could form the basis of a generic ORCID client.
The application allows the user to
- create an ORCID profile
- update an ORCID profile
- link existing publications to an ORCID profile
- view the records for publication stored in an ORCID profile
The code was developed by Genevieve Turner of the ANU Data Commons team and is available for download from https://github.com/anu-doi/orcid-updater
under a GPL 3 license. As it is prototype code no warranty is made to its suitability or fitness.