BibTeX as a data interchange format

As part of our metadata stores initiative we decided to allow researchers to import their publication history in BibTeX format.

The idea was quite simple – if a researcher comes from elsewhere he or she quite possibly has a publication history already and might wish to import it to our metadata store so that these publications turn up on an online cv or just simply to add to their publication history.

As Zotero, Mendeley and EndNote can all export in BibTeX format we chose BibTeX as a lowest common denominator import format.

We have now started on trying to do the reverse – allowing researchers to take away a copy of their publication history as known to us using BibTeX, again as most other bibliography software can read it.

Doing this has revealed a set of interesting problems. BibTeX is of course a format from the days of 8-bit computing, which causes some fun with non-ASCII characters as crop up in the titles of research papers, the gammas and the deltas of mathematics and biology and the umlauts and acutes, but as LaTeX has a heritage in the physical and mathematical sciences there are well understood ways round them and conventions for encoding these characters.

And then we found a research paper with a title in Japanese. BibTeX does not really support utf-8, which makes the aim of creating an unambiguous export format a trifle difficult – because we can’t guarantee that what we have is parsed correctly by whatever program you feed our export file through.

At the moment we are simply embedding a warning as a comment in the export file but that feels unsatisfactory. Pragmatically we need to structure the file in a way that Zotero, Mendeley or EndNote can parse, but scouring the support websites doesn’t seem to suggest a set of default practices or workarounds …

Written with StackEdit.

About dgm

Former IT professional, previously a digital archiving and repository person, ex research psychologist, blogger, twitterer, and amateur classical medieval and nineteenth century historian ...
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to BibTeX as a data interchange format

  1. dgm says:

    Pragmatism rules – most recent products just take in UTF-8 encoded strings as is ….

  2. Pingback: Using BibTeX for dataset citation | Building an archive solution

  3. Pingback: Exporting references from Dspace in BibTeX format | Building an archive solution

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s