Going Static Part 1

Messing with your <head>

Sun Oct 21 2018 14:40:10 GMT+1100 (AEDT)

This is the first of a three part series delving deeper into to some of the things I learned when I migrated my blog from Ghost to the static site generator Eleventy. I won't be giving a blow-by-blow description, and actually won't write a great deal about Eleventy specifically. This is really a reflection on the broader technical things the migration process helped me to understand better. I've broken this into three parts. This post is essentially about <meta> and <link> tags, what goes in the <head> of a website, and most importantly, why. None of this is particularly original thinking on my part - I'm really just recording what I learned in case it's helpful or interesting for others. Part two will be all about constructing an RSS feed from scratch, and how RSS actually works. The last in the series will be about how I automated images using the Unsplash API, and a little tool I created as a consequence.

An HTML file has two basic components - the head, and the body. Generally speaking, the head contains the metadata and the body contains the content. In my last post I mentioned that one of the things I liked about Eleventy is that I can completely control what goes in the head - so this post is a bit of an expanation about what and how I've set it up. In most websites there are only four types of elements you're likely to see in the head: title, script, link, and meta. The title element is pretty straightforward: it's where your title goes. This determines what appears at the top of your browser tab, so generally you probably want it to be shorter rather than longer. The <script> element is also fairly straightforward: this is where you link to, or in some cases simply write, JavaScript files. On my blog it looks like this:

<script type="text/javascript" src="/assets/scripts/jquery-2.2.4.min.js"></script>
<script type="text/javascript" src="/assets/scripts/bigfoot.min.js"></script>
<script type="text/javascript">$.bigfoot();</script>
<script type="text/javascript" src="/assets/scripts/moment.min.js"></script>

The first two lines point to scripts hosted on my server, the third line is a jQuery (first line) constructor calling the bigfoot (second line) function. The last line inserts the momentjs library.

The <link> and <meta> elements are a bit more interesting. The typical uses for a link element are simply to link to a CSS or favicon file:

<link rel="stylesheet" type="text/css" href="/assets/css/style.css">
<link rel="icon" href="/images/favicon.png">

Here we see the three important parts of the link element:

Link elements can also be used in more interesting ways. My go-to source for html elements and everything JavaScript is the Mozilla Developer Network, and in their page on link elements MDN says that <link> "specifies relationships between the current document and an external resource." So the html link element is in many ways the canonical example of using the much-talked-about linked data. If you've every said "yes but how can I use linked data?", one answer is that you're probably already using it if you have a website. I'd never really thought too much about the link element before having to construct the head of my blog from scratch, but it turns out there's some pretty useful things you can do.

Linking to a style sheet or icon is nice, but can we link to something more interesting? Well yes - you can link to pretty much anything for any reason. Something you may have seen if you've ever looked at the head of a web document is a thing called 'canonical':

<link rel="canonical" href="https://www.hughrundle.net/">

'Canonical' is not really relevant to most people just publishing a blog or information site, but because there are so many retailers and other people doing weird things, search engines now expect a 'canonical' link and may punish sites that don't have them. 'Canonical' basically means "the official version of this page". If a site is doing A/B testing of something, they might have website.com/page/version-a and website.com/page/version-b, but they would both have a link element pointing to the 'canonical' version (i.e. the one that should be indexed by search engines) at website.com/page. That, however, is pretty irrelevant to me - every page I publish on my blog is the canonical version of that page. So what I (and most people) want is to put the current URL in the href attribute of rel="canonical".

Using data programatically

Obviously editing the head by hand for every page would be tedious, which is where static site generators show their value. I'm using the handlebars templating language with Eleventy, so I can just put this in my head:

<link rel="canonical" href="{{site.root}}{{page.url}}">

Then when the page is processed by Eleventy, {{page.url}} picks up the url of the current page - but it will be a relative file path. To get a full url I need the root - in my case https://www.hughrundle.net. Eleventy allows you to create a JSON file in a directory called '_data', and then use that data in any file. My data file is called 'site.json', so {{site.root}} is the value of 'root' in that file. We'll come back to this again, so let's have a look at what's in site.json:

"title": "Information Flaneur",
"author": "Hugh Rundle",
"description": "A blog about libraries, computer programming, and the impending end of humanity.",
"root": "https://www.hughrundle.net",
"generator": "Eleventy",
"language": "en-AU",
"license_type": "CC-BY 4.0",
"license_link": "https://creativecommons.org/licenses/by/4.0/",
"twitter": "@hughrundle"

Some of this is used in the head, and some of it we won't look at until my next post on RSS. As I stated above, theoretically you can link to any external resource and describe a relationship with it. I initially used this to link to the full CC-BY 4.0 license I used for all my posts, using the rel="license" microformat:

<link rel="license" href="{{site.license_link}}">
<!-- after processing becomes... -->
<link rel="license" href="https://creativecommons.org/licenses/by/4.0/">

However, microformats are not widely used or understood, so I later decided to make everything Dublin Core compatible. We'll look at that in a moment.

Everything else of interest in the head goes inside <meta> elements. The rules on meta elements are, in some ways, a bit stricter than for link elements, but we'll also see shortly how in practice they're actually very flexible. First up, a reminder from my last post. You can set a Referer HTTP header for all or some links on your website in a couple of different ways. For an individual <link> or <a> element you can set a referrerpolicy attribute. But if you want the same referrer policy for all your outbound links (and you probably do), you can set it up with the appropriate meta tag attributes:

<meta name="referrer" content="no-referrer">

We can also use a few of the other standard attribute values to put in some basic metadata:

<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="{{site.generator}}">
<meta name="description" content="{{#if summary}}{{summary}}{{else}}{{site.description}}{{/if}}">
<meta name="keywords" content="{{tags}}">

Here we set the character set (utf-8), note the 'generator' (coming out of site.json, and in my case Eleventy), give a description (coming from the post summary or, if there isn't one, the description in site.json), and put in some keywords (the comma-separated list of tags from the post). The element named 'viewport' is used by mobile devices and helps make your page a sensible size on smaller screens.

The rest of our metadata starts to get a bit confusing. You can really put whatever you like in the 'rel' part of a link element, or the 'name' part of a meta element, as long as it's useful to and understood correctly by whatever it is you're hoping will parse it. This is an example of what Jeremy Keith means when he says that The Web is Agreement. But if you want to be extremely clear about what your terms mean, you can link to your schema before you use it. This is how Dublin Core works:

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" >
<meta name="DC.title" content="{{title}}{{#if subtitle}} - {{subtitle}}{{/if}}">
<meta name="DC.creator" content="{{#if author}}{{author}}{{else}}{{site.author}}{{/if}}">
<meta name="DC.identifier" content="{{site.root}}{{page.url}}">
<meta name="DC.language" content="{{site.language}}">
<meta name="DC.rights" content="Copyright {{#if author}}{{author}}{{else}}{{site.author}}{{/if}}">
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<link rel="DCTERMS.license" href="{{site.license_link}}">

Here we see how you can use Dublin Core in both link elements and meta elements. The first link element points to the Dublin Core elements schema, and following that I can now unambiguously use Dublin Core elements in my meta name attributes. The second link element points to the Dublin Core terms schema. The third link element uses that schema to point to the license for the page (this is the Dublin Core equivalent of rel="license" above). By placing a link to the schema before using it, we can be clear about what the 'name' or the 'relationship' actually means.[1]

Not everyone is as thorough as the Dublin Core Metadata Initiative. You can rely on the MetaExtensions list managed by the Web Hypertext Application Technology Working Group and know that at least somebody will be able to parse your meta tag meaningfully (the Web is agreement, remember). This explains why Twitter cards markup works:

<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="{{site.twitter}}">
<meta name="twitter:description" content="{{#if summary}}{{summary}}{{else}}{{site.description}}{{/if}}">

Twitter meta tags are used to determine how Twitter displays links in Twitter posts - it's what produces Twitter 'cards'. You can set it to display as a 'summary' card like I have, or a large image, or a couple of other options, as well as linking to your own Twitter account and setting a description of your site and the image itself. Twitter has effectively 'namespaced' the name 'twitter', and when you post a link to Twitter it knows to read the head and see if there are any <meta name="twitter:SOMETHING"> elements. I was feeling very happy with myself for finally understanding all of this, when I checked the Open Graph schema and discovered this:

"The four required properties for every page are..."

The Open Graph protocol

WTF is a property‽ There's no property attribute for meta elements in the HTML standard - what's Facebook playing at? It turns out that the Open Graph protocol[2], rather than simply extending the meta element namespace 'by agreement' informally like Twitter and Dublin Core, uses the more formal RDFa standard to extend HTML. So if we want to mark up some basic Open Graph metadata, we do it with what are technically RDFa tags rather than HTML tags, like this:

<meta property="og:url" content="{{site.root}}{{page.url}}">
<meta property="og:type" content="article">
{{#each tags}}
<meta property="article:tag" content="{{this}}">

But there's a convenient side-effect to this: You can use RDFa values inside HTML elements. That means you can reduce the amount of markup you use by declaring two things at the same time:

<meta name="author" property="article:author" content="{{#if author}}{{author}}{{else}}{{site.author}}{{/if}}">
<meta name="twitter:title" property="og:title" content="{{title}}{{#if subtitle}} - {{subtitle}}{{/if}}">
<meta name="DC.created" property="article:published_time" content="{{page.date}}">

Nice! There are two other things I put in my <head>- a link to my RSS feed, and a couple of elements for social media images - but I'll be talking about those in my next two posts in this series, so you'll just have to wait to read more about them.

  1. You might also notice that Dublin Core helpfully differentiates between 'rights' (I own the copyright to my blog posts) and 'license' (I license all my posts CC-BY 4.0). ↩︎

  2. Open Graph was created by Facebook for much the same reason Twitter created their Twitter Cards markup, but is also used by other services. ↩︎