The machine in Ghost

30 September 2018

This will probably be my last post published with Ghost. I was a backer of Ghost's original Kickstarter campaign, and I believed the hype. John O'Nolan's promise was that Ghost would be "Just a Blogging Platform", but since then it has changed focus to be "The professional publishing platform". This includes big changes in Ghost 2.0 that include changing the default editing screen from Markdown to WYSIWYG. Far from being 'just a blogging platform', Ghost clearly now wants to be 'WordPress but written in nodejs'. That's fine, but it's not what I signed up for, and I've been eyeing off static site generators for a little while. I'm now about to take the plunge and move from Ghost to ...well, to a static site. But I'll be generating it with Eleventy. More on that in my next post, but this is my September post for GLAM Blog Club and I'm going to share a few of the, well, strange things I've discovered about Ghost while doing some processing on my export file.

Where's the Markdown?

Ghost has a feature - tellingly still in "Labs" in my version - where you can essentially export your database in one big JSON file. If you want your images as well, too bad, but that's not what I want to talk about here. Back in November last year I wrote about a nodejs script I wrote to translate the Ghost JSON data into a WordPress XML import file. I've taken the guts of that script and repurposed it to export all the posts as Markdown files. I could have used ghost-export but I wanted to keep all the metadata with the files - title, tags, and importantly the permalink. More on permalinks later. I may contribute some code to ghost-export at some point to add this as a feature, but in the short term I just rewrote my existing script for my particular needs.

Because Ghost posts are originally written in Markdown, it should theoretically be easy to get the original Markdown back out again. It is relatively straightforward once you know where to look, but the structure of the export file is surprising to say the least. The file comes out Ghost like this (simplified for clarity):

db: [
  {
    "data":{
      "settings": {...},
      "posts": [
        {
          "id": "123456",
          "title": "My Awesome Post",
          "published_at": "2014-02-10T11:48:33.000Z",
          "mobiledoc": "{... \"cards\":[[\"card-markdown\",{\"cardName\":\"card-markdown\",\"markdown\":\"This is where the Markdown is."}"
        }
      ]
    }
  }
]

Haha 🙃 what the hell? For reasons I can't quite fathom, not only is the markdown hidden inside a key called "mobiledoc", but the value of "mobiledoc" is a string - not an object. I'm sure there's some logical reason for this, though I can't think what it is. The consequences are that in order to get at the data, you have to de-stringify mobiledoc first:


for (let post of backup.db[0].data.posts) {
  // this doesn't work
  console.log(post.mobiledoc.mdJSON.cards[0][1].markdown)

  // this works by de-stringifying mobiledoc so it can be used as an object
  let mdJSON = JSON.parse(post.mobiledoc)
  console.log(mdJSON.cards[0][1].markdown)
}

But really, even if it was stored as an object and not a string, what kind of person thinks post.mobiledoc.mdJSON.cards[0][1].markdown is a reasonable place to put the original input text?

Tags are linked data

The second thing that confused me for a little while when I originally wrote ghost-to-wp was that tags aren't directly stored with posts. Each tag has its own entry in tags, and every instance of a tag is stored in posts_tags. That is, every time a tag is applied to a post, a new entry is created in posts_tags, with a pointer to each of the post and the tag:

"posts_tags": [
  {
    "id": "5986cf4d302e6e25d7a931a7",
    "post_id": "5986cf49302e6e25d7a93152",
    "tag_id": "5986b1ea7b1d0e0b4084799d",
    "sort_order": 0
  } ...
]

...points to

"tags": [
  {
    "id": "5986b1ea7b1d0e0b4084799d",
    "name": "Getting Started",
    "slug": "getting-started",
    "description": null,
    "feature_image": null,
    "parent_id": null,
    "visibility": "public",
    "meta_title": null,
    "meta_description": null,
    "created_at": "2017-08-06T06:06:34.000Z",
    "created_by": "1",
    "updated_at": "2017-08-06T06:06:34.000Z",
    "updated_by": "1"
  } ...
]

...as well as the post with "id": "5986cf49302e6e25d7a93152"

It's linked data! This is actually a kind of nice demonstration of linked data in the wild: if, for example, the 'slug' or feature image changes for a tag, or even the name of the tag, it will be updated wherever that information is linked from. It requires a bit of scripting gymnastics to export it back out with the rest of the information about the post, but it's a smart way to store it.

Think locally, store as UTC

If you know your temporal data standards, you will have recognised that all the timestamps in the examples above are ISO8601 dates in UTC (Greenwich Mean) time. This is convenient, because UTC time can easily be translated into local time in the client using a standard offset. There's a gotcha here though: if you've selected "use dated permalinks" then the date that is used is local time - not UTC. As explored in further detail below, the Ghost team not only decided not to store tags with posts, but also decided not to store permalinks with posts - a far more dubious decision. The way a dated permalink is constructed in Ghost is to take the publication date, adjust it to 'local' time as per the settings in the Ghost console, and then create a link in the form /YY/MM/DD/slug. What is stored in the post itself is the UTC date and the slug, so to reconstruct the permalink when exporting you have to adjust the publication date according to the local timezone (in the general blog settings) and then extract YY, MM and DD. Depending on what your time zone is and what time of day you usually publish, this may affect many, a few or none of your posts.

If you're thinking that all sounds a big complicated, prepare to have your brain explode: the actual dates in dated Ghost permalinks don't matter anyway! I discovered this accidentally when I thought I'd messed up the translation to local time - one of the posts I linked to from another post seemed to have an 'off by one' dating error. When I looked at it more closely, however, I discovered that I'd simply used the wrong date in the link. The reason it was confusing is that the link still works. It appears that Ghost redirects everything that has the correct date pattern before the slug. Take, for example, this post:

This is the permalink:

https://www.hughrundle.net/2018/09/30/the-machine-in-ghost

But with Ghost, this works too:

https://www.hughrundle.net/18/9/30/the-machine-in-ghost

You don't even have to use numbers:

https://www.hughrundle.net/this/works/also/the-machine-in-ghost

Now that's strange.