Open like medical gowns, Standard as in oil

6 June 2017

Same, same, but different

A couple of months ago I attended a consultation on ...well, it wasn’t enormously clear what we were really being consulted on, but the stated topic was a possible single statewide library management system for Victoria. The session did provide an interesting snapshot of where various services and library managers’ thinking is at, and a sidebar discussion on proprietary versus open-source: it’s fair to say I probably didn’t behave as professionally as I could have at this point, when someone suggested that open source library software was open to ‘hacking’.1 More on that later. Several people in the room, including me, were fairly unconvinced that a single statewide system is a good idea, and in the rest of this post I will try to sketch out why.

“One LMS” is the default in Tasmania due to all public libraries being run by the State government, and was also rolled out in South Australia recently. Other states, particularly Western Australia and Victoria, have been talking about it on and off, sometimes with options papers from their State Governments. Andrew Kelly from Western Australia recently proposed, in response to the financial pressures on Western Australian libraries and the fact that much of the state is about to move off their Amlib quasi-monopoly, a statewide Koha ILS instance But “One LMS” is the wrong solution to a largely non-existent problem.

Proponents of single statewide library management systems generally argue that it is a necessary precondition for

  • a single ‘use anywhere’ library card; and
  • easy inter-library loans; and/or
  • a statewide collection; and
  • up to date technology and customer-friendly features available to all libraries in the state;

For the purposes of this post, I’m going to leave aside the question of a statewide collection. This is because it’s not really a technical question, but more a political one - and that’s for another time.

Medical gowns

A great deal of library software as currently written, installed, and used is like a medical gown: closed at the front, open at the back. Our systems are sloppy in the way they handle authentication, often lack the most basic encryption of user data, and have a long history of hostility to bibliographic and other public data being easily accessed or repurposed by other systems.

Over several years, Eric Hellman has devoted a great deal of time to documenting the many ways user privacy and security is compromised by various software systems used by libraries. Passwords are stored in plain text, medical journals let ad networks spy on their readers, and library catalogues squirt user information around like water sprinklers. When library systems are hacked (as they, inevitably will be) our users’ home addresses, dates of birth and borrowing history are all exposed. My thought experiment “Tinfoil“ was an attempt to outline one part of the solution, with a system that allows for personalised reading suggestions without compromising privacy.

At the same time we leak all of this data about users, data about our collections is locked up in individual MaRC files and hard-to-integrate proprietary catalogues. In the last few years this problem seems to finally be getting some attention, and I’m much more hopeful than I was when I proposed that we simply burn it all down. But it’s extremely slow going, and as it stands virtually all library metadata is locked into the same formats and standards we’ve been using for decades. It’s only very recently that some libraries stopped blocking search engines using robots.txt, and a great deal of ‘integration’ still involves screen-scraping.

Let’s work as a team, and do it my way

Just as academic publishers have seamlessly moved from “university pays for reading” to “university pays for writing” models and called the result “open”, there has been a great deal of open-washing in the library software market. Every one of the ever-shrinking number of library software companies now boasts that their product is “open” because they have APIs. Now don’t get me wrong, well-documented web APIs where the library can provide access to certain data for integration with other services is a great thing and certainly a big leap forward from where we were. The problem is that each system has its own API, with its own terms, its own data framework, and its own logic. And whilst a product like Ex Libris’ Alma or SirsiDynix’s BLUEcloud might have ‘open APIs’, that fact alone doesn’t justify describing the whole proprietary, centrally controlled product as “open”.

Fixing authentication

On the flip side, what standardisation we do have is generally holding libraries back. The problems with MaRC are well documented and largely being worked on, albeit at a glacial pace, so I’ll set those aside. One thing we can and should kill and replace much sooner is SIP2. The Standard Interchange Protocol version 2 (SIP2) is one of those classic “seemed like a good idea at the time” protocols that has long outlived its usefulness. It was designed to allow 3M self-service kiosks to talk to library management systems - effectively it is the “3M kiosk API” of the 1990s, with an updated version released in 2006. Being a protocol rather than a standard, SIP2 is inherently insecure and sends all data in the clear without encryption - encrypting data sent via SIP2 can only be done by first setting up an encrypted tunnel via stunnel or a VPN, and then sending SIP2 messages through the tunnel. This practice is increasingly common, but still definitely not industry-standard.

If you’re not sure why this is a problem, think about what the main use cases for SIP2 are:

  1. send data between kiosks and the library database (likely over the public internet) including username and password, as well as the title and other information for every item a patron has borrowed.
  2. send data between third-party services like journal databases, online learning tools and ebook vendors (definitely over the public internet), including username and password, and probably an institution access key.

All of this travels in plaintext in the clear. For most public libraries, the majority of SIP2 connections (though probably not the majority of SIP2 transactions) will be for item 2 - authentication of a user against the library’s database. This includes things like ebook vendors, online learning tools like Lynda.com, printing and PC management systems, and room booking systems or even after-hours access to the library as part of the “Open Library concept” becoming popular in northern Europe. SIP2 is a peculiarly bad protocol to use for this sort of thing, because secure user authentication over the web is a largely solved problem. There is literally no reason to use SIP2 for user authentication except that it’s a protocol that every library management system/ILS/LSP can use.

PressReader is a good example of the weirdness of forcing vendors to use SIP2 to authenticate users. For quite a while, when it was still called Library Press Display, libraries subscribing to PressReader ‘authenticated’ users via barcode masking.2 This is obviously a not-very-good way to check that users are authorised to use a service, and PressReader eventually blocked access via this method, eventually introducing SIP2 authentication. If you look at their sign-in page, however, you’ll notice something interesting:

PressReader login page

Other than PressReader credentials, there are four ways to sign in. One of them is with library ID/password using SIP2, and the others are via Twitter using OAuth, and Facebook or Google using what appears to be OpenID Connect. Both OAuth and OpenID Connect are extremely well-documented, secure, and widely used standards built on top of HTTPS. OAuth is an authorisation standard (allows another application to take actions within your app on behalf of a user) whereas OpenID Connect is really an authentication standard (allows a user to prove their identity to another app by proving their identity in your app) - exactly what we’re looking for in this situation. So the obvious thing we should be doing instead of insisting vendors use the shitty, ancient, insecure-by-design SIP2 protocol for user authentication is insist they use the well-documented, widely used, well-maintained, secure-by-default OpenID and OAuth standards to do the job.

Using something like OpenID Connect (with the library as the identity provider) for authentication also potentially makes it much, much easier for both libraries and service providers to strike access deals. PressReader is a good example of this, as is Zinio. Both of these businesses were originally conceived with individual subscriptions in mind. Only later was the idea of libraries subscribing on behalf of their user-base incorporated into the business model, with the technical aspects bolted on as an afterthought. Many vendors only offer IP authentication, because working through the bizarre world of SIP2 is too time-consuming or creates a bunch of technical bloat in their application. If PressReader had been able to simply use OpenID Connect to authenticate library users, everyone would be better off - users would have their passwords securely transmitted, the library would be able to set up authentication quickly, and PressReader could use the same web industry standards they use for Facebook and Google.

You’ll never work in this town again

So far we have briefly discussed the problems of insecure protocols and practices, disparate, software-specific APIs, and the particular problem of authentication. I now want to propose a way forward. This is more or less what was discussed at the consultation meeting I mentioned at the beginning of this post, but put a little more concretely. The main problem with library sofware at the moment is that vendors have been driving the conversation. Occasionally, a new standard appears to allow libraries to easily move between services, and have all systems talk the same language. But monopolists gonna monopolise, so standards like ILS-DI have basically died at birth, because interoperability doesn’t suit the business models of major software vendors.

XKCD - Standards XKCD - Standards

So, how to fight against monopolies? By creating a monopsony, of course. Instead of creating a monopoly across an entire state, with everyone locked in to the same library software, Victorian libraries could insist that all systems must comply with certain standards. Much as one can only buy cars that conform to minimum standards for fuel efficiency and safety, Victorian libraries (or the State Government) could collectively insist that to be eligible for a contract, vendors must comply with minimum standards for interoperability and security. We could allow, say, two years for vendors to become compliant, after which any non-compliant systems would automatically be excluded from consideration. We don’t even have to come up with a ‘fifteenth standard’ for library-specific operations, because the UK book publishing and library industries have come up with a reasonably good one already.

Let’s have another look at the things Victorian libraries wanted.

A single ‘use anywhere’ library card

A couple of years I wrote about a possible “AusID” to smooth signup processes. Whilst the prospect of an Australian Federal department managing digital identities is increasingly worrying, the general concept could work for library memberships. Imagine a simple, centralised, securely maintained database of patron contact information - called myLibrary or something naff like that. This could operate as the identity provider for both library management systems and third party services. When Victorians want to join a new library service, they can authenticate against myLibrary, and when they want to use a third party service like PressReader they could also authenticate against myLibrary. myLibrary doesn’t need to hold or access any circulation data - all it does is act as an identity provider.

To be used for third-party authentication myLibrary would need to have a flag for each library noting whether the user is blocked for any reason, but individual libraries could also use OpenID Connect to act as identity providers for third party services - which could include other libraries. Using OpenID this way, we might not even need myLibrary. Either way, it’s clear that the ability to use OpenID Connect as an identity provider needs to be part of our minimum requirements.

Easy inter-library loans

As I mentioned earlier, UK librarians have worked with the Book Industry Council to create the BIC-LCF - a new framework designed primarily to replace SIP2, but to do so in a way that largely future-proofs the standard by outlining what data should be transferred and what it should be called, but now how. This won’t suit all use-cases, but it’s a good place to start.

Inter-library loans in a public library setting are somewhat simpler than in an academic setting - generally we don’t deal with requests for articles, for example. Library Link Victoria (LLV) is, at least in theory, the main way inter-library loans are managed in Victorian public libraries. LLV uses OCLC’s VDX, so there is already a way for libraries to manage inter-library loans without all being on the same system. But it should be possible to manage things without an intermediary if we had a good standard to work with. Using something like the BIC-LCF, it’s not too difficult to imagine a standardised system for placing reservations on items held in other library services, and managing the transfer through to a reservation for a local patron.

Up to date technology and customer-friendly features available to all libraries in the state

By raising the minimum standards for all systems used in the State, the choices available to all libraries should improve. The problem of systems that are under-developed and don’t keep up with our needs is primarily caused by a combination of proprietary system development being driven by the needs of library vendors rather than libraries, and a lack of real competition in the library software market. Forcing vendors to comply with an agreed set of minimum standards forces them to either meet the requirements, or forego millions of dollars of business. This could include both existing standards and protocols, as well as possibly a specific new standard or, preferably, BIC-LCF or an extension of it. So for example, we could require TLS or other web industry best-practice encryption for all web connections, OpenID Connect for user authentication, U2F for second factor authentication, as well as new standards to enable secure communication of fees and payments data between library services. Who knows, maybe one day we might even finally move away from MaRC.

Let a thousand flowers bloom

I’m currently in the middle of preparations to transition my library service from Amlib to Koha ILS, so I’m sympathetic to Andrew Kelly’s proposal for a single Western Australian Koha instance. But competition and the flexibility of being able to choose the tool that best suits a particular library service and the community it services are both key considerations. Statewide software projects do not have a particularly good track record in Australia, with huge cost blowouts and long delays the norm. In my view the last thing we should be doing is locking whole states into a single software product, even an open source one. If we want innovation and progress in library software, encouraging even more consolidation and less competition is a bad way to go about it. On the other hand, thousands of amazing products, platforms and ecosystems have been built on open standards - think of, well, everything built on the web, or all the things that use postcodes as a reference point. Instead of creating an inflexible monolith by chasing dreams of One LMS to Rule them All, librarians everywhere should embrace the opportunities of open standards and modern, modular software design.

The best way to work together is when we can to talk to each other.


1

the implication being that proprietary software is less susceptible because the source code is secret. Given that every proprietary library management system I’ve ever seen has appallingly bad security practices, I found this question ...misguided. Neverthless I didn’t behave particularly well in response. Apologies to the questioner.

2

that is, if you typed in a string that looked like the barcodes used by that library service, you’d have access.