Library Map Part 2

How

28 January 2021

This is the second in a series of posts about my new Library Map. You probably should read the first post if you're interested in why I made the map and why it maps the particular things that it does. I expected this to be a two part series but it looks like I might make a third post about automation. The first post was about why I made the map. This one is about how.

The tech stack

The map is built with a stack of (roughly in order):

  • original Shape (SHP) and GeoJSON files
  • QGIS
  • geojson
  • a bunch of csv files
  • a tiny python script
  • topojson
  • some HTML, CSS and JavaScript
  • leafletjs and leaflet plugins
  • Map Box tile service

Boundary files

Since I primarily wanted to map things about library services rather than individual library buildings, the first thing I looked for was geodata boundary files. In Australia public libraries are usually run by local government, so the best place to start was with local government boundaries.

This is reasonably straightforward to get - either directly from data.gov.au or one of the state equivalents, or more typically by starting there and eventually getting to the website of the state department that deals with geodata. Usually the relevant file is provided as Shapefile, which is not exactly what we need, but is a vector format, which is a good start. I gradually added each state and data about it before moving on to the next one, but the process would basically have been the same even if I'd had all of the relevant files at the same time. There are two slight oddities at this point that may (or may not 😂) be of interest.

Australian geography interlude

The first is that more or less alone of all jurisdictions, Queensland provides local government (LGA) boundaries for coastal municipalities with large blocks covering the coastal waters and any islands. Other states draw boundaries around outlying islands and include the island — as an island — with the LGA that it is part of (if it's not "unincorporated", which is often the case in Victoria for example). As a result, the national map looks a bit odd when you get to Queensland, because the overlay bulges out slightly away from the coast. I'm not sure whether this is something to do with the LGA jurisdictions in Queensland, perhaps due to the Great Barrier Reef, or whether their cartography team just couldn't be bothered drawing lines around every little island.

Secondly, when I got to Western Australia I discovered two things:

  1. The Cocos (Keeling) Islands are an Overseas Territory of Australia; and
  2. Cocos and Christmas Islands have some kind of jurisdictional relationship with Western Australia, and are included in the Western Australia LGA files.

I hadn't really considered including overseas territories, but since they were right there in the file, I figured I may as well. Later this led to a question about why Norfolk Island was missing, so I hunted around and found a Shapefile for overseas territories, which also included Cocos and Christmas Islands.

Shapefiles are a pretty standard format, but I wanted to use leafletjs, and for that we need the data to be in JSON format. I also needed to both stitch together all the different state LGA files, and merge boundaries where local councils have formed regional library services. This seems to be more common in Victoria (which has Regional Library Corporations) than other states, but it was required in Victoria, New South Wales, and Western Australia. Lastly, it turns out there are significant parts of Australia that are not actually covered by any local government at all. Some of these areas are the confusingly named national parks that are actually governed directly by States. Others are simply 'unincorporated' — the two largest areas being the Unincorporated Far West Region of New South Wales (slightly larger than Hungary), and the Pastoral Unincorporated Area that consists of almost 60% of the landmass of South Australia (slightly smaller than France).

I had no idea these two enormous areas of Australia had this special status. There's also a pretty large section of the south of the Northern Territory that contains no libraries at all, and hence has no library service. If you're wondering why there is a large section of inland Australia with no overlays on the Library Map, now you know.

QGIS and GeoJSON

So, anyway, I had to munge all these files — mostly Shape but also GeoJSON — and turn them into a single GeoJSON file. I've subsequently discovered mapshaper which I might have used for this, but I didn't know about it at the time, so I used QGIS. I find the number of possibilities presented by QGIS quite overwhelming, but there's no doubt it's a powerful tool for manipulating GIS data. I added each Shapefile as a layer, merged local government areas that needed to be merged, either deleted or dissolved (into the surrounding area) the unincorporated areas, and then merged the layers. Finally, I exported the new merged layer as GeoJSON, which is exactly what it sounds like: ordinary JSON, for geodata.

CSV data

At this point I had boundaries, but not other data. I mean, this is not actually true, because I needed information about library services in order to know which LGAs collectively operate a single library service, but in terms of the files, all I had was a polygon and a name for each area. I also had a bunch of location data for the actual library branches in a variety of formats originally, but ultimately in comma separate values (CSV) format. I also had a CSV file for information about each library service. The question at this point was how to associate the information I was mapping with each area. There was no way I was going to manually update 400+ rows in QGIS. Luckily, CSV and JSON are two of the most common open file formats, and they're basically just text.

Python script

I'd had a similar problem in a previous, abandoned mapping project, and had a pretty scrappy Python script lying around. With a bit more Python experience behind me, I was able to make it more flexible and simpler. If we match on the name of the library service, it's fairly straightforward to add properties to each GeoJSON feature (the features being each library service boundaries area, and the properties being metadata about that feature). This is so because the value of properties within each feature is itself simply a JSON object:

{"type": "FeatureCollection",
"name": "library_services",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::3857" } },
"features": 
[{ "type": "Feature", "properties" : {"name": "Bulloo Shire"}
 "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [143.78691062,-28.99912088],[143.78483624,-28.99912073] ... ]]}

The python script uses Python's inbuilt json and csv modules to read both the geojson and the csv file, then basically merge the data. I won't re-publish the whole thing, but the guts of it is:

# for each geojson feature, if a field in the json matches a field in the csv, add new properties to the json
for feature in json_data['features']:
  with open(csv_file, newline='') as f:
    # use DictReader so we can use the header names
    reader = csv.DictReader(f)
    for row in reader:
      # look for match
      if row[csv_match] == feature['properties'][geojson_match]:
        # create new properties in geojson
        for k in row:
          feature['properties'][k] = row[k]

The whole thing is fewer than 40 lines long. This saved me heaps of time, but as you'll discover in my future post on automation, I later worked out how to automate the whole process every time the CSV file is updated!

TopoJSON

GeoJSON is pretty cool — it's specifically designed for web applications to read and write GIS files in a native web format. Unfortunately, GeoJSON can also get very big, especially with a project like mine where there are lots of boundaries over a large area. The final file was about 130MB — far too big for anyone to reasonably wait for it to load in their browser (and Chrome just refused to load it altogether). Because of the way I originally wrote the Python script, it actually became nearly three times the size, because I put in a two-space indent out of habit. This created literally hundreds of megabytes of empty spaces. "Pretty printing" JSON is helpful if a human needs to read it, but rather unhelpful if you want to keep the file size down.

Enter TopoJSON. To be honest I don't really understand the mathematics behind it, but TopoJSON allows you to represent the same information as GeoJSON but in a much, much smaller file. I reduced a 362MB GeoJSON file (admittedly, about 200MB being blank spaces) to 2.6MB simply by converting it to TopoJSON! By "quantising" it (essentially, making it less accurate), the file size can be reduced even further, rendering the current file of about 2.2MB - definitely small enough to load in a browser without too much of a wait, albeit not lightning fast.

Good old HTML/CSS/JavaScript

At this point we're ready to start putting together the website to display the map. For this I used plain, vanilla HTML, CSS, and JavaScript. The web is awash with projects, frameworks and blog posts explaining how to use them to create your SPA (Single Page App)™️, but we really don't need any of that. The leaflet docs have a pretty good example of a minimal project, and my map is really not much more complex than that.

Something that did stump me for a while was how to bring the TopoJSON and CSV files into the JavaScript file as variables. I'm a self-taught JavaScript coder, and I learned it back to front: initially as a backend scripting language (i.e. nodejs) and then as the front-end browser scripting language it was originally made to be. So sometimes something a front-end developer would consider pretty basic: "How do I import a text file into my JavaScript and assign it to a variable?" takes me a while to work out. Initially I just opened the files in a text editor and copy-pasted the contents between two quote marks, made it the value of a javascript variable, and saved the whole thing as a .js file. But it was obvious even to me that couldn't possibly be the correct way to do it, even though it worked. In nodejs I would use fs.readFile() but the only thing that looked vaguely similar for front end JavaScript was FileReader — which is for reading files on a client, not a server. Finally I did a bit of research and found that the answer is to forget that the file is sitting right there in the same directory as all your JavaScript and HTML files, and just use AJAX like it's a remote file. The modern way to do this is with fetch, so instead of doing this:

// index.html
<script src="./boundaries.js" type="text/javascript"></script>
<script src="./branchesCsv.js" type="text/javascript"></script>
<script src="./ikcCsv.js" type="text/javascript"></script>
<script src="./mechanics.js" type="text/javascript"></script>
<script src="./nslaBranches.js" type="text/javascript"></script>
<script src="./load-map.js" type="text/javascript"></script>

// boundaries.js
const boundaries = `{"contents": "gigantic JSON string"}`
// branchesCsv.js
const branchesCsv = `lat,lng,town,address,phone
-35.5574374,138.6107874,Victor Harbor Public Library Service, 1 Bay Road, 08 8551 0730
... etc`
// ikcCsv.js
const ikcCsv = `lat,lng,town,address,phone
-10.159918,142.166344,Badu Island Indigenous Knowledge Centre,Nona Street ,07 4083 2100
...etc`
// mechanics.js
const mechanics = `lat,lng,town,address,phone
-37.562362,143.858541,Ballaarat Mechanics Institute,117 Sturt Street,03 5331 3042
..etc`
// nslaBranches.js
const nslaBranches = `lat,lng,town,address,phone
-37.809815,144.96513,State Library of Victoria,"328 Swanston Street, Melbourne",03 8664 7000
... etc`

// load-map.js
	// boundaries and the other constants are now globals
	const loanPeriod = new L.TopoJSON(boundaries, options)

...we do this:

// index.html
<script src="./load-map.js" type="text/javascript"></script>

// load-map.js
const boundaries = fetch('data/boundaries.topo.json')
.then( response => response.json())

const branchesCsv = fetch('data/public_library_locations.csv')
.then( response => response.text());

const ikcCsv = fetch('data/indigenous_knowledge_centre_locations.csv')
.then( response => response.text());

const mechanics = fetch('data/mechanics_institute_locations.csv')
.then( response => response.text());

const nslaBranches = fetch('data/nsla_library_locations.csv')
.then( response => response.text());

// fetch returns a promise so we have to let them all 'settle' before we can use the returned value
Promise.all([boundaries, branchesCsv, ikcCsv, mechanics, nslaBranches])
.then( data => {
	// data is an array with the settled values of the fetch() promises
	const loanPeriod = new L.TopoJSON(data[0], options)
}

In the code this doesn't necessarily look much simpler, but in terms of workflow it's a huge improvement that cuts out manually copy-pasting every time a CSV or TopoJSON file is updated, and reduces duplication and the total number of files.

So now the site consists of:

  • the original data as CSV and TopoJSON files
  • an index.html file to display the map
  • a single CSS file for basic styling
  • a single JavaScript file to load the map

Leaflet and friends

Finally it's time to actually put all of this stuff into a map using Leaflet. This is a really great JavaScript library, with pretty good documentation. Leaflet allows us to plot shapes onto a map, and using JavaScript to make them interactive - including adding popups, zoom to features when they're clicked, and add interactive overlays.

I won't try to replicate the Leaflet docs here and explain the exact steps to making my map, but I do want to highlight how two Leaflet plugins really helped with making the map work nicely. Leaflet has a fairly strong plugin collection, and they allow the base library to be fairly lightweight whilst the entire system is still quite flexible and fully featured.

I knew from the beginning it would require the whole library community to keep the map up to date over time. There are hundreds of library services across Australia, and they don't set their rules or their procurement decisions in stone forever. So it needed to be relatively simple to update the data as it changes. As we've discussed, GeoJSON also takes up a lot of space. Ideally, I could store as much data in CSV files as possible, and use them directly as data feeding the map. Turns out there's a plugin for that - Leaflet.geoCSV. This allows us to load CSV files directly (for library building locations), and it's converted to GeoJSON on the fly. Since CSV files are much smaller than the equivalent data in JSON, this is not only easier to maintain, but also loads faster.

The second plugin that really helped was Leaflet.pattern. The problem this helped me to solve was how to show both the fines layer and the loan period layer at the same time. Typically for a chloropleth map, different colours or shades indicate certain values. But if you add a second overlay on top of the first one, the colours no longer necessarily make much sense and combinations can be difficult or impossible to discern. Thinking about this, I figured if I could make one layer semi-transparent colours, and the second layer patterns like differently angled stripes or dots, that might do the trick. Leaflet.pattern to the rescue! After some alpha-testing by my go-to volunteer Quality Assurance tester, I worked out how to make the layers always appear in the same order, regardless of which order they were added or removed, making the combination always look consistent:

Animated GIF showing overlays

Tile service

Once all of that's complete, we can load the map. But there's a problem: all we have is a bunch of vector points and lines, there's no underlying geography. For this we need a Map Tile service. We can use one of several options provided by OpenStreetMap, but I ended up using the commercial Map Box service on a free plan (or at least, it will be free as long as thousands of people don't suddenly start using the map all at the same time). Their dark and light map styles really suited what I was trying to do, with minimal detail in terms of the underlying geography, but with roads and towns marked at the appropriate zoom level.

So that's it! It took a while to work it all out, but most of the complexity is in getting the data together rather than displaying the map. Once I had that done (though there is still a fair bit of information missing), I was able to pay more attention to maintaining the map into the future. That led me to look into some options for automating the merging of data from the library services CSV file (when it's updated) into the TopoJSON file, and also automatically refreshing the data on the actual map when the GitHub repository is updated. In my next post I'll explain how that works. While you're waiting for that, you can help me find missing data and make the map more accurate 😀.