Building a twitter bot using node, feedparser and simple-twitter (part 1)

16 July 2015

Last weekend I should have been doing a lot of things. Washing. Working on my VALA conference paper. Reading the many books on my TBR pile. Instead, I made a Twitter bot. I’m still astonished that it actually worked. Here I’ll explain how I did it and by extension how you can make your own, if you’d like to.

A little background - why Node?

A little bit of background first - because hey it’s my blog so you have to indulge me. If you just want to get stuck into writing the bot, skip down to ‘Getting Started’, below.

’Learn to code’ has been on my New Year Resolutions list two years running. Until now, the last program I wrote was in BASIC on a BBC Microcomputer when I was twelve years old. That was quite a while ago. After completely failing to even get started in 2014, I was determined to actually learn something this year. I asked Twitter that annoying question - “Which language should I learn?” and got some patient and thoughtful answers from people I respect - Andromeda Yelton, Coral Sheldon-Hess, Cecily Walker, and Chris Cormack among many others. In among all the back and forth, there were two clear instructions that most people seemed to agree on:

  1. choose a project you want to work on, and learn whatever language it’s written in.
  2. learn JavaScript second.

I faffed around for a few months, before having to break one of those rules in order to follow the other one. After many conversations about library software and patron privacy, I decided to try building a very simple model for an encrypted library circulation system using Mylar. Mylar is built on Meteor, which is a framework for building web applications based on Node. The whole point of Node is that it allows JavaScript on the server. This is pretty useful, because it allows you to code in all JavaScript all the time.

I already had some exposure to Node, because Ghost, the open source software I use for this blog, is built on Node. So I decided to try building my secure LMS prototype using Meteor, and learn some JavaScript and Node along the way. Turns out Rule 1 was absolutely correct - I’ve learned a bunch and stuck at it, because I had an actual project to work on. I had to break Rule 2, but I feel ok about that.

Why a Twitter bot?

I’ve been wanting to build a Twitter bot for a while. Watching what Tim Sherrat has built over the years has been fascinating. I was particularly taken by his subversive Operation Random Words, which appeared shortly after the Australian Government announced Operation Bring Them Home, following on from their wildly successful gulag-building program Operation Sovereign Borders. If you’re not familiar with a Twitter bot, the basic idea is that they are a Twitter accounts that are controlled by software rather than humans. Lots of them are spam bots that randomly Favourite or Follow, some are political statements like Operation Random Words, and others are just stupid fun, like Robot J. McCarthy, which retweets tweets containing words like ‘commie’ or ‘socialist’, with a McCarthyesque comment.

The problem with Tim’s bots is that they are written in Python. This was actually one of the most popular suggestions when I asked ‘Which language?’, but I’ve made my bed and now I need to lie in it, so I looked for a solution in Node and JavaScript. The attraction of a Twitter bot was that it seemed like a reasonably sized challenge (not to big, but not too simple), and I’m familiar enough with them that I understand the basic principles in terms of Twitter behaviour. There are, after all, only so many things you can do on Twitter. In all, it took me most of a weekend to build, but that’s mostly because I was learning as I was doing it (which of course was most of the point).

The final trigger was actually the online conversation that happened at the end of June regarding the life and death of blogging, and in particular a desire from several blogging librarians to see more Australian librarians blogging. Alisa Howlett blogged about her proposed solution to tackle the problem from the supply end - (re)starting a group blog. She recently followed this up with a call for volunteers. I was thinking about how to assist by making library blogs more visible in the day to day conversation. The project I came up with was a version of Code4Lib’s InfoPeep, but for Australian GLAM bloggers - Aus GLAM Blog Bot.

Getting started

I won’t give you a blow by blow description of exactly what I did, because it would mostly consist of “..and then I spent 2 hours searching Stack Overflow to work out why my code didn’t run”. What I am going to do, however, is take you through how you can build something similar, assuming you are reasonably intelligent but have little or no coding experience. I am barely even qualified to call myself an amateur hack coder, let alone a programmer, so if I can do it you can too.

Equipment

First up, you’re going to need some gear.

  1. A computer. I used a Macbook Air, but you can use any computer you have access to, as long as you’ve got admin privileges (because you’ll need to install some software).
  2. An internet connection.
  3. A text editor. You could use the default TextEdit on Mac or Notepad on Windows - but I strongly recommend you don’t attempt to do that. It will only end badly. If you think you won’t do much coding in future, or refuse to pay for software, download Text Wrangler. If you think you might want to do more coding in future, and don’t mind eventually paying a very reasonable fee, use Sublime Text.
  4. Node. Once you have your computer and editor sorted, head to the Node downloads page and download the latest version for your operating system. Install as you normally would - with a Mac it’s pretty straightforward, just double click the pkg file and say yes to everything when prompted.
  5. An email address that is not already registered with Twitter.
  6. A mobile phone number that allows you to receive SMS and is not already registered with Twitter. If you’re in the US, you could use a Google phone number. Everyone else needs a real number.

Something I’m still becoming comfortable with is using the command line to do stuff. Once you get the hang of it, it really does make things easier, but it can be pretty confusing at first and like many things used by programmers the documentation is often difficult to understand if you’re a beginner. You’ll need to use the command line later, so you may as well start getting used to it from the beginning. If you have no idea what you’re doing, or just have a bad memory, I have found ss64 useful. SSH Commands is also pretty helpful and very clear as a reference.

Once you’ve installed Node, open up Terminal if you’re using a Mac, or cmd.exe if you’re using a PC (if you’re using Linux you probably don’t need to read this post). You’ll be in the default user directory. We want to create our own folder for the project:

mkdir myproject

To check whether it was really created , list all the folders you have:

ls

Then change the directory (i.e. navigate to the folder) that you just created:

cd myproject

NPM

When you installed Node, it came with a program - NPM. The dirty little secret of coding is that most programs consist of a lot of code that was written by someone else. It’s the same basic principle that sees laptops running applications on top of an operating system, or your phone OS running a bunch of apps. Depending on the language, the code that someone wrote for you could be called a library, a package, a module or even a gem. Confusingly, Node seems to officially refer to them as modules, but the default system for installing them is called NPM - Node Package Manager. NPM allows you (amongst other things) to install Node packages easily from the command line by simply typing npm install [package name]. For the moment, however, we’re going to open a browser and check out a couple of packages before we install them.

feedparser and simple-twitter

The guts of the bot code is a combination of feedparser and simple-twitter. Feedparser takes any RSS or Atom feed and splits each feed and its articles into their component parts (i.e. it parses feeds). Simple-twitter allows basic interaction with the Twitter REST API. Combining the two gives us the ability to grab RSS feeds, extract the bits we want, and turn them into tweets.

Dependencies

At this point we need to quickly discuss dependencies. A dependency is simply another piece of code that is relied upon by another piece of code in order for it to work. For our Twitter bot, feedparser and simple-twitter will be dependencies. Technically, node is probably considered a dependency too. But feedparser and simple-twitter rely on other code too. It’s dependencies all the way down. If you want your code to run properly, you need to ensure that you have all the dependencies (and their dependencies) on your machine. With npm this is pretty simple.

If we start with simple-twitter, take a look on the right-hand side of the page. It lists one dependency - oauth. You’re probably familiar with oauth as a user - you may well have used it to authorise applications to tweet or post to Facebook on your behalf. In this case, we actually don’t need to install oauth, because we’re not going to use that functionality. It’s worth remembering, however, that if you do want to use it in future you’ll need to install the oauth package. You will also notice that the simple-twitter npm page lists four dependents. These are packages that require simple-twitter for their functionality (just like our bot will).

Moving on to feedparser, we will need to install four dependencies - readable-stream, array-indexofobject, address parser, and sax. We’ll install these in a moment. First, however, we need to address a gotcha that wasted about two hours of my weekend. if you look at the ‘Usage’ section of the feedparser page, you will see a line like this:

request = require(‘request’);

This is node-speak for “Don’t go any further if the ‘request’ package isn’t installed”. Technically request isn’t a dependency, but if we want to use the example code (which we do) , then we’ll need it.

So, let’s do it! Head back to your command line (Terminal/cmd) and install each dependency by typing npm install [package name] and hitting enter/return:

npm install simple-twitter
npm install feedparser
npm install readable-stream
npm install array-indexofobject
npm install addressparser
npm install sax
npm install request

Each package will take a moment to install, then allow you to continue installing the next one. If you now look at the myproject (or whatever you called it) folder, you will notice it has a new subfolder called node_modules, containing a folder for each dependency. You created all this by installing from the command line.

Time to code

Now you will need to fire up your text editor (Text Wrangler or Sublime Text). Open a new file, and save it as ‘myproject.js’. Your text editor should now apply particular styles and colours to the code knowing that it is javascript, which will makes things much easier for you. An important thing to remember is that if you simply copy and paste text into your text editor, you may get errors. In particular, certain versions of quote marks don’t come out correctly. As tedious as it is, you should type out all the code I give you here.

We’ll start by simply typing in the example text from the feedparser page:

var FeedParser = require(‘feedparser’)
  , request = require(‘request’);

var req = request(‘http://somefeedurl.xml')
  , feedparser = new FeedParser([options]);

req.on(‘error’, function (error) {
  // handle any request errors
});
req.on(‘response’, function (res) {
  var stream = this;

  if (res.statusCode != 200) return this.emit(‘error’, new Error(‘Bad status code’));

  stream.pipe(feedparser);
});


feedparser.on(‘error’, function(error) {
  // always handle errors
});
feedparser.on(‘readable’, function() {
  // This is where the action is!
  var stream = this
    , meta = this.meta // **NOTE** the “meta” is always available in the context of the feedparser instance
    , item;

  while (item = stream.read()) {
    console.log(item);
  }
});

Next we’ll fix it up a bit by adding error-handling code, and entering a valid URL. Replace the third line with:

var req = request(‘http://glambottest.wordpress.com/feed/')

and replace both of the

// handle any request errors

comments with:

console.error(error);

This is not generally considered to be a sophisticated way of dealing with errors, but our program is going to be pretty small and it’s good enough for us. This line simply loads any error text into the console so you will know what has gone wrong.

We now have working code. If you ran this, it would grab whatever is in the RSS feed file for the blog I set up for testing (http://glambottest.wordpress.com) and spit out the entire result into your console (Terminal/cmd/whatever). This is nice, but it’s not really what we want. If you go back to the feedparser page, you’ll see a ‘List of meta properties’ and a ‘List of article properties’. The meta properties are for the whole blog, whilst the article properties are obviously for the individual articles. We can select any combination of these properties to spit out, rather than the entire RSS feed.

I wanted the article title, article author, and article URL, so that’s what we’ll use here. Replace

console.log(item);

with

console.log(item.title,|, item.author,|, item.link);

Save your file, then go to the command line and type

node myproject.js

A problem we’ll run into eventually is that tweets are restricted to 140 characters. If our title is too long, when we send title | author | url to Twitter we might lose the url. To prevent this, we need to cut the title off if it’s too long. Because we don’t know how long the author name will be either, we’ll trim the title at 70 characters. Luckily for us, JavaScript has an easy way for us to do this: the length property. We need to add an ‘if ..else’ statement inside the while statement:

  while (item = stream.read()) {
		// get the length of the title text, and the title text itself.
  	var titleLength = item.title.length;
  	var itemTitle = item.title;
		// if the title is shorter than 70 characters, log it to the console.
  	if (titleLength < 70) {
    console.log(item.title,|, item.author,|, item.link);
	  }
    // If the title is longer than 70 characters we truncate it.
	  else {
	   trimmedTitle = itemTitle.substring(0, 70);
	   console.log(trimmedTitle,... |, item.author,|, item.link);
	  };
  };

Now if you node myproject.js the title of the last article should be shortened.

Setting up your Twitter app account

This is all very well, but this is supposed to be a Twitter bot, so it’s time to bring the Twitter. Head over to the Twitter website and create a new account for your bot. You’ll need a unique email address and a unique phone number. Once you’re registered and logged in, change the url to apps.twitter.com. Fill in the form and click ‘Create your Twitter application’. Go to ‘keys and access tokens’ and you’ll need to ‘create my access token’. Once you’ve done that, we’re good to keep coding.

simple-twitter

Go back into your text editor, and above everything else enter the code to set up your twitter parameters:

var twitter = require(‘simple-twitter’);
twitter = new twitter( ‘xxx’, //consumer key from twitter api
                        ‘xxx’, //consumer secret key from twitter api
                        ‘xxx’, //access token from twitter api
                        ‘xxx’, //access token secret from twitter api
                      );

Obviously you need to replace ‘xxx’ with whatever the comment next to it indicates.

Now we’re going to replace our console.log() statements with simple-twitter post functions:

          if (titleLength < 70) {
          	twitter.post(‘statuses/update’,
                            {‘status’ : item.title +|+ item.author +|+ item.link},
                            // deal with any twitter errors
                            function(error, data) {
                              console.dir(data);
                            }
                        );
        	 }
          // If the title is longer than 70 characters we truncate it.
          else {
          	trimmedTitle = itemTitle.substring(0, 70);
          	twitter.post(‘statuses/update’,
                    {‘status’ : trimmedTitle +|+ item.author +|+ item.link},
                      // deal with any twitter errors
                      function(error, data) {
                          console.dir(data);
                      }
            );
          };

Now, if you node myproject.js your Twitter bot should tweet everything that previously appeared in your console.

That’s enough for today. In Part Two I’ll show you how to turn your code into an autonomous bot by adding extra RSS feeds, limiting how many articles you tweet, looping the code, and running it on a server.