As part of our metadata stores initiative we decided to allow researchers to import their publication history in BibTeX format.
The idea was quite simple – if a researcher comes from elsewhere he or she quite possibly has a publication history already and might wish to import it to our metadata store so that these publications turn up on an online cv or just simply to add to their publication history.
As Zotero, Mendeley and EndNote can all export in BibTeX format we chose BibTeX as a lowest common denominator import format.
We have now started on trying to do the reverse – allowing researchers to take away a copy of their publication history as known to us using BibTeX, again as most other bibliography software can read it.
Doing this has revealed a set of interesting problems. BibTeX is of course a format from the days of 8-bit computing, which causes some fun with non-ASCII characters as crop up in the titles of research papers, the gammas and the deltas of mathematics and biology and the umlauts and acutes, but as LaTeX has a heritage in the physical and mathematical sciences there are well understood ways round them and conventions for encoding these characters.
And then we found a research paper with a title in Japanese. BibTeX does not really support utf-8, which makes the aim of creating an unambiguous export format a trifle difficult – because we can’t guarantee that what we have is parsed correctly by whatever program you feed our export file through.
At the moment we are simply embedding a warning as a comment in the export file but that feels unsatisfactory. Pragmatically we need to structure the file in a way that Zotero, Mendeley or EndNote can parse, but scouring the support websites doesn’t seem to suggest a set of default practices or workarounds …
Written with StackEdit.