Building a twitter bot part 2 - it’s aliiiive!

18 July 2015

In part 1 of ‘building a twitter bot’ we stepped through setting up nodejs and some fairly basic plumbing to link feedparser and simple-twitter. Today I’ll show you how you can add some other features to make it a bit more sophisticated, and automate your bot so it runs by itself.

Add more feeds

Adding extra feeds was actually the last thing I worked out, but I’m trying to show you how to make your own bot in a more logical order than I used. You’ll remember that towards the top of our code we stated that we are using feedparser, and created a feed request variable:

var FeedParser = require(‘feedparser’)
  , request = require(‘request’);

var req = request(‘https://glambottest.wordpress.com/feed/')
  , feedparser = new FeedParser();

This works really well if you only want to tweet a single RSS feed, but what if you want to use more than one feed? One option I looked at was using an OPML file, using something called opmlparser. From what I could tell, however, opmlparser gives us less granularity with the individual feeds, and in the end I found a simpler way to do it. All we need to do is put all our feeds in an array and then loop through it.

To do this we need to adjust our code a bit. First of all, we need to add a variable for our array, and add more feeds to it. Put this before the line starting var req = request...:

var feeds = [ ‘https://glambottest.wordpress.com/feed/',
            ‘http://blog.ghost.org/rss/',
            ‘http://whatever.blog.address/feed'
			];

Note that we’re using square brackets here, because it’s an array. What we need to do now is run our code once for each feed, looping through each item in the array. This is quite simple if you use JavaScript’s Array.forEach() method:

  feeds.forEach(function(theUrl){
	    var req = request(theUrl)
      , feedparser = new FeedParser();
	// insert all of the remaining code here
  });

Exclude any old posts

If you have run your code a few times using node myproject.js, you will have noticed that every time your run your code it spits out everything in the feed, no matter how old it is. Because different sites set up different parameters on their RSS feeds, feedparser deliberately has no built-in way to restrict the number of articles coming through on any given feed (some feeds come through in chronological order, some in reverse chronological, for example). But we really don’t want all the old feeds when we run our code - otherwise we’ll just keep tweeting out the same old posts over and over.

To resolve this problem, we’re going to filter the articles by publication date. We'll create two new variables - dateNow and dateYesterday, and write another if .. else statement.

First find this line:

while (item = stream.read()) {

This is the line that tells our program what to do once it can read the RSS feed. Underneath, we’ll create a variable that gets the date and time right now:

var dateNow = new Date();

Then we’ll create a new variable that gets the time 24 hours ago. Why 24 hours? Mostly because I’m cautious about RSS feeds and times on different servers - they don’t necessarily all agree about what the time or date is. We probably could make this time much shorter (more on that later), but for now we’ll just stick with one day. JavaScript dates are all in milliseconds, so we get yesterday’s date/time by removing 86,400,000 milliseconds from dateNow:

var dateYesterday = dateNow - 86400000;

Now we start our if ... else statement:

if (item.date > dateYesterday){
// do all the things in the rest of our code
}

That is, if the article date is within the last 24 hours we want to post it. We could actually leave our statement there, but if nothing gets posted you won’t know whether there were no new articles, or your code just didn’t work, so we’ll add a else statement at the end:

   if (item.date > dateYesterday){
   // do all the things in the rest of our code
   }
   else {
      console.log(‘no new posts’);
        };
      };
    });
  });

Keep your code running and add a loop

We now have a program that iterates through a list of RSS feeds, finds all the articles that are less than 24 hours old, and tweets about them. This is fun, but we have to keep typing node myproject.js every time we want it to run. To make a truly autonomous bot we need to keep the code running without us. There are two aspects to this - first, we need to make our code run however often we want, and then we need to keep it alive.1

First of all, we’ll use setInterval to tell our code to wait a specified period of time before running. Earlier we set a time limit on the age of the articles we’re posting. We could make these intervals nearly match so that we never get repeated posts2 - I didn’t do this, mostly because I didn’t want to accidentally miss any posts that were delayed or had servers with slightly wrong clocks, but that may be over-cautious. We’ll set the interval to thirty seconds here so you can see it in action, but Aus GLAM Blogs Bot is actually set to ten minutes so it doesn't unnecessarily hit the Twitter API. Towards the top of your code, just underneath the Twitter credentials, create a new variable called timerVar:

var timerVar = setInterval (function () {runBot()}, 30000);

This basically says ‘after 30 seconds, call the runBot() function’. At this point you probably are wondering where that function is, because you haven’t seen it before - we’re about to create it.

Directly underneath, type:

function runBot() {
// all the rest of your code
};

What we’ve done here is create a thirty second delay at the beginning of your program, before running everything. Now we’re going to make it run itself.

Above the line where you created timerVar, we’re going to create a simple server using Node’s createServer method. You can do quite sophisticated things with this, and if we were proper programmers making a big complex app we’d put it all in a file called server.js, but our Twitter bot is pretty simple and I’m lazy, so we’ll just put it all in the one file:

var http = require(‘http’);
http.createServer(function (req, res) {
  res.writeHead(200, {‘Content-Type’: ‘text/plain’});
  res.end(’Name Of Your App\n’);
}).listen(8080);

Now if you type node myproject.js in the command line you should notice that it keeps running without returning a cursor. Every thirty seconds you should get a callback message saying ‘no new posts’, and some Twitter errors telling you that you already posted that message. This is all good, because it shows that your bot is working!

Deploying to a server and running forever

Unless you want to keep your laptop running continuously and have a great internet connection, if you want your bot to keep running without you, you need to install your code on a server. There are so many ways you could do this that I’m not going to take you through it, but I used Digital Ocean, which offer pretty reasonably priced ‘droplet’ virtual private servers designed especially for running apps. If you use the link above you’ll get $10 in credit to start you off! Justin Ellingwood has written a really great tutorial on how to initially configure your Ubuntu Linux server, which should be helpful even if you’re not using Digital Ocean. To move your files onto the server, the Penn State Institute for Cyberscience have a helpful Command line ssh user guide to help you.

Once you’ve got your files in place, you’ll need to download all the dependencies again (see Part 1). You should then also download ‘forever’:

npm install forever

Forever is a great tool that will ensure that your script doesn’t fall over even if there is a temporary outage or other problem with your server. Once it’s installed, you should be able to start your bot by typing:

forever start myproject.js

Get the code

Hopefully this has been a useful two-part guide. As I said in Part 1, I’m very much a novice, so there’s probably a lot of things in this guide that will make professional programmers either laugh or cry. Nevertheless, the code runs, and it’s not as if we’re using it to fly an Airbus, run a bank, or store other people’s private information. Lose the fear, and have a go - it’s a lot easier than you think, and when it inevitably breaks or doesn’t run, you’ll learn lots of things whilst you try to fix it. You can find all the code on Gitlab.


1

To be honest this is the part I’m least confident I got right. It works, but I’m sure it’s not the most elegant way to make the bot run autonomously. I may ‘refactor’ this bit of the bot code in future.

2

In theory the Twitter API should return an error if you try to post the same thing twice, but occasionally it lets it through for some reason. See previous note.