12 November 2017

A couple of weeks ago I made a script to translate a JSON export file from Ghost into an XML import ('WXR') file for WordPress. If you read my post from 2014 about why I moved from WordPress to Ghost this might strike you as an odd thing for me to do, and indeed I wasn't that happy about it. The reason I needed to migrate a Ghost blog to WordPress is that we're integrating the newCardigan website with CiviCRM soon, and Civi only runs on WordPress, Joomla! and Drupal. Joomla! is awful, none of the Cardi Core have any experience with Drupal, and all have experience with WordPress, so it was pretty easy to decide which to use. CiviCRM will allow us to move everything 'in house' - event bookings, website posts, cardicast episodes, memberships, and fundraising. Having one platform - one that we completely control - will make things easier for us and mean we're not forcing our community to give their details to yet another third party.

The first thing I did was, of course, search the web for someone else's solution. The official WordPress documentation is weirdly silent about the whole matter, although it could simply be that it's so poorly organised that it is there, but I couldn't find it. Tim Stephenson provides one possible solution, but it seemed very convoluted to me, involving re-arranging things in OpenRefine. It then looked like Ahmed Amayem had built the perfect tool to convert the Ghost JSON export to a usable XML import file for WordPress, but I couldn't get it to work. I'm not sure exactly what the problem was, but I ended up making my own tool to do the job.

Mostly the script simply moves data from a JSON field into a matching XML tag, plus adding some information in the header so that WordPress recognises it as a WXR file. The most complicated part was translating the logical but somewhat eccentric way Ghost stores tag information into information about which tags applied to which posts in WordPress. My initial version more or less worked, but with one important flaw: authors were not imported. It took me another week to realise that this was because every tag in the Wordpress XML file is encased in CDATA except for the title, and I had neglected to excape the <, > and & characters. As soon as you use one of these characters (I'd used '&' in a couple of post titles), the XML breaks. This didn't seem to stop the posts being imported, but did stop WordPress recognising the authors. Once I added a section to escape these characters, it seems to work pretty well.

The final thing I learned making this tool is how the package.json for npm packages is supposed to work. I usually don't bother with one of these files, but by filling in a bit of JSON in this file I made it possible to simply download ghost-to-wp, and then (assuming you have a reasonably recent version of nodejs installed) type two commands:

npm install
npm start yourghostexportfile.json

A file called WP_import.xml will be created, and you can simply use the WordPress import tool to import all your posts.

If you are in the unfortunate situation where you need to migrate from Ghost to WordPress, ghost-to-wp should make it pretty easy for you. You should be able to migrate authors, posts (including published status and stickiness), and tags. The main thing that won't come across in the script is images, because Ghost doesn't have a nice way to export them, and WordPress doesn't have a usable way to import them.