Watching the feeds

Sun Mar 25 2018 08:05:00 GMT+1100 (AEDT)

With all the talk recently about Facebook and Cambridge Analytica, it's timely to write about open web technologies. Rich Site Summary/Really Simple Syndication is a deceptive standard. When Google shut down Google Reader many people declared or believed that was the end of RSS as a technology. Yet the beauty of open standards is that when a single company decides to no longer use it, the technology doesn't have to die. At almost the same time the demise of Google Reader was being interpreted as the death of RSS, podcasts were being declared the next hot thing. Yet podcasting is almost entirely reliant on RSS for distribution. For every Medium post about the alleged death of RSS, there are 5 terrible podcasts using RSS to make their way onto smartphones around the world.

I've spent a bit of time playing around with RSS as I've been learning to code. Aus GLAM Blogs relies on RSS: without it, the app essentially wouldn't work at all. My most popular Twitter bot, @lib_papers, also relies on RSS, pulling in news stories from Reuters and mashing them up with library-themed topics. Recently I've been working on a new RSS-based project, Empocketer. RSS has been around for so long, there's a chance your library catalogue includes it as a feature. This may be more useful than you imagine, because RSS in library catalogues commonly hooks in to searches: effectively it allows you to subscribe to a particular set of search parameters. If you are an ancient librarian like me, you might have heard of the concept of SDI, which predates the web but was able to be fully and easily realised with the emergence of RSS and Atom. If your library management system has a feature that emails library members when a new item they're interested in is added to the catalogue, chances are high that RSS is involved in the process.

The cool thing about RSS, however, is that if you know the URL, you can take advantage of it without any access to the back-end or code. Let's take a little look at how this works. I've just borrowed Mar Hicks' Programmed Inequality from the library, so we'll do a subject search for keywords "women" and "computer".

The URL for the results page looks like this:
https://mylibrary.brimbank.vic.gov.au/cgi-bin/koha/opac-search.pl?idx=su&q=women+computer

And the RSS feed looks like this:
https://mylibrary.brimbank.vic.gov.au/cgi-bin/koha/opac-search.pl?idx=su&q=women%2520computer&count=50&sort_by=acqdate_dsc&format=rss2

Or, a little more clearly, what we're requesting from the server is:

idx=su&q=women+computer

vs

idx=su&q=women%2520computer&count=50&sort_by=acqdate_dsc&format=rss2

The query is essentially the same, it's just that in the second one we specify that the format should be RSS2 rather than a standard HTML page. Because the end-user doesn't have control over sort order, we have to specify that up front, and the default limit on the number of records returned is 50. The RSS2 standard defines how information about each item is sent. We can use something like the feedparser npm package to pull out data in a consistent manner. In fact, for some common metadata types, feedparser will actually pull data from both RSS and Atom feeds, even though the standards differ. That means that we can watch feeds in multiple formats and (at least for some data) use a single codebase to use that data in interesting and useful ways. You can look at my very simple example on Runkit to see how you can do this for a single feed, and the endpoint shows you how one of the world's worst RSS readers might display it. As written, this is less useful than simply opening the feed in a web browser, but hopefully it gives you an idea of what can be done. Of course it's perfectly possible to mash all three feeds listed (from Koha ILS, Spydus, and Symphony respectively) into a single run, allowing for a merged feed, which would be a bit more useful.

There are plenty of ways to watch RSS feeds, some of them surprising. Microsoft Outlook, for example, has a perfectly serviceable RSS reader built in. Properly constructed RSS readers will only show you things published since you last looked: but they can do this without you having to register with the site, or them having to get permission to use an API. The reason you hear so much about RSS being 'dead' is that commercial publishers don't like it. RSS pushes the content out to the reader, making it impossible for publishers to track you or serve advertising dynamically. From my perspective this is an advantageous feature, but from the point of view of commercial media it's a bug.

In the constant bombardment of new, shiny, 'disruptive' tech, it's easy to forget about things that are not even very old. RSS/Atom is a technology that still works extremely well, has low overhead, and is easy to implement. It's an open standard that anyone can use without asking permission, paying a fee, or worrying that their access will be cut off arbitrarily. Importantly, when you use RSS, you're watching the website, instead of the website watching you. No wonder Big Tech tried to kill it.