Building a Pusher Clone

tl;dr - you’re probably best off using Pusher for your real-time needs, rather than rolling your own. However, this was as much an exercise for me to learn as it was solving a specific business need. Read on to find out more.

With the inexorable rise of single page applications (SPAs), gathering data on how your app is used in the real world has become a challenge, where perhaps previously it wasn’t. In the document oriented web, each page would be a distinct server-side route, where you were free to capture and store whatever data came in with the request. In SPAs however, users can carry out multiple actions without ever triggering a server round trip. Therefore, you have to explicitly address the “analytics question”, and figure out what data you want to capture, where it needs to end up, and how you’re going to get it there, all without disrupting your users experience of the app.

We faced exactly this problem when we took the decision to re-implement one of our applications as a JavaScript app (go check it out). We quickly realised that we were faced with the problem described above, and so here’s an overview of how we went about solving it.

Overview

We identified 3 main components of a system that would enable our SPAs to report useful analytics data out to a suitable location.

Web Client - a JavaScript library/module that any app could leverage in order to publish data. This would encapsulate all of the implementation details of things like network requests, error handling etc.
Web API - a publicly accessible web endpoint that would receive all published data.
Message Broker - the web service that recieves the data will have no knowledge of where the data should end up. It was also likely to be under heavy load, so it should do as little as possible to ensure that it can handle a large volume of requests. Therefore, the incoming data would be offloaded to a message broker, from where subscribers could process it.

Protocol

The first incarnation of the system exposed a HTTP endpoint, which accepted a POST request containing the desired data. This meant that the client was doing cross domain XHRs each time a snippet of data needed publishing. As the complxity of the app grew, so did the number of these requests, meaning in the long run it simply wasn’t efficient. The overhead of each request far outweighed the actual payload. However, this was only ever intended as a stop gap, as the real win here was WebSockets.

The server now exposes a WebSocket endpoint, as well as HTTP. All incoming requests are mapped to the realtime routes, using a lovely little framework called express.io, which is simply express + socket.io. The realtime route broadcasts the data back out to subscribers, as well as pushing it to the message broker. From there, it finds its way onto queues ready to be picked up.

Message Broker

Pushing data to a message broker, and then onto queues, has the added benefit of allowing you to publish data in the knowledge that it’s going to be available to you at a later time. This is useful, as if there are no realtime subscribers listening at the point at which data is published, the data will simply disappear. This offers the best of both worlds. You can build realtime dashboards to see current stats for an application, as well as ensuring that the data is stored for offline reporting and analytics. We chose to use RabbitMQ for this, as we’ve got experience of using it in other applications.

JS Client

As mentioned, the first implementation of the client was making XHRs to the data endpoint. But by implementing express.io, it could now leverage WebSockets. The client currently bundles up the socket.io-client, and exposes 2 methods, to publish events and to subscribe to events.

Conclusion

Realtime systems are fun.

As mentioned in the tl;dr, Pusher is likely a safer bet for the realtime communication aspect. One could still leverage a message broker off the back of a Pusher subscriber, so you end up with the same benefit.