How to make your JavaScript apps indexable

04 Oct 2013 posted by Tomas Nygren

I’ve recently been spending a fair amount of time trying to find a suitable way to solve the age old problem of making JavaScript apps indexable. There are plenty of posts and articles offering advice and opinion, so I’ll add into the mix my own findings. Below are some of the key findings I’ve collated together, along with a few different ways of approaching the problem.

Escape fragments

Roughly, here is a schematic of how it works

Escape fragments

  1. The crawler requests www.example.com/coolflathtml5app.html
  2. Crawler sees the meta tag <meta name="fragment" content="!">
  3. Crawler requests www.example.com/coolflathtml5app.html?_escaped_fragment_
  4. Your server looks for any request that contains ?_escaped_fragment_
  5. Your server generates a html snapshot of the requested url and serves that to the crawler

This is the method that Google recommends. For other crawlers that don’t support escape fragments, you have to fall back to useragent sniffing.

Pros

  • You’re serving the exact same content to user and crawlers
  • Passive, the crawler is asking for the html snapshot rather than you inspecting the useragent
  • You probably don’t need to make that many changes in your application to implement this

Cons

  • Impossible to debug (you could use Fetch as Googlebot but we can only do that on live urls)
  • The Google implementation guide is based on hashbangs and I’m not sure how it will deal with pushstate. Do crawlers support HTML5 pushstate?
  • Adding extra parameters to your querystring adds a small risk of unexpected behaviour in your application

User agent sniffing

Server side generation based on User Agent

  1. The crawler requests www.example.com/coolflathtml5app.html
  2. The server picks up the user agent (example: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots))
  3. The server generates a html snapshot and sends it back in the response

Pros

  • Easy to debug
  • Relatively straight forward to implement
  • Very small risk of unexpected behaviour in your applications

Cons

  • If the application doesn’t recognize the user agent we can’t serve them a html snapshot
  • User agent sniffing is considered bad practise as it’s often used to serve different content to crawlers. That could get us flagged by search engines - and we don’t want that, do we now!

Server-side rendering

Server-side rendering

Read more about this setup in this example

  1. Crawler requests www.example.com/coolflathtml5app.html
  2. Server checks cache
  3. The cache layer passes the request down to nginx which in turn fires PhantomJS through a server-side application (Java in the picture above)
  4. The result of the PhantomJS rendering is sent to the client via the cache, where they remain until it is expired

This approach is probably the fastest and most sensible approach to render your application, but also the most complex. Implementing this alongside your MV* JavaScript framework of choice wouldn’t be a trivial task, and what works for say, AngularJS, might be altogether different if using Knockout.js.

Pros

  • Quick and cached
  • One solution for users and crawlers
  • Easy to debug
  • Partial support for users without JavaScript support

Cons

  • Complicated and difficult to implement
  • Big risk that solutions become tightly coupled to technology used

Open Source solutions

Having said all of that, a lot of people have already attempted to solve this problem. There are a few different things on GitHub already that you could use and contribute to, here’s a list of just some of the things I found and that looked encouraging

seoserver