Building Better Beacon

Published See discussion on Bluesky

Over the past year or so we've been rebuilding a core library that supports tracking business KPI and metrics across our various web and native app experiences.

This involved quite a bit of work from a variety of teams, I acted as one of the lead architects in designing the web integration for the new system, and worked with my team to build it out and get the first iteration in production.

It had been smooth sailing for several months across the first few experiences that adopted the new version of the core library, at least until the other week where we identified a show stopping bug.

Background

The library leverages navigator.sendBeacon to post event payloads to one of our internal processing gateways, we use this data to get an overall sense of how users are browsing and interacting with our sites, and it powers key business metrics like our visits, add to cart rate and our overall conversion.

We chose sendBeacon because of it's slightly stronger guarantee that the browser won't drop the requests when a page is unloaded or navigated away from. This gave us confidence that we could send richer content (via JSON payloads) than our previous system, and also allow us to track key interactions before they resulted in the customer navigating to another page.

Our previous iteration of this system, which was about 6 years old or so, heavily relied upon cookies and sending data via image requests (which greatly inhibited the size and fidelity of the data we could collect).

The Bug

We've been adopting this new core library on several experiences as of late, and on one of them we noticed that we were queuing up a large number of these events via sendBeacon, so many in fact that the browser would effectively stall and no longer send the requests.

In the DevTools Network panel they'd all show up as (pending) without any clear indicator on why they weren't being flushed.

At first, we thought it could be due to some limit on the total number of connections being made, but there seemed to be no other network requests that would similarly get stuck.

We did some research and found out a painful little caveat with sendBeacon (and one that also applies to fetch keepalive which we considered as an alternative also):

  1. The payload to be sent must be smaller that 64kb (for Chrome, seems other browsers also use a similar limit) source
  2. The total size of the queued payloads to be sent must be smaller than 64kb

We checked our pending requests and none of them got close to the 64kb limit, we then tallied up our total requested payload size and got a bit worried.

Our first read through of the second limitation was that it was 64kb in total for the lifetime of a page. In that case it'd only take about a hundred or so requests (with an average size of ~500b) before we hit that cap.

Honestly, we were pretty concerned that we'd need to go back to the drawing board and consider alternative ways to send these payloads.

Good Fortune

Fortunately however, some of my teammates re-read the limitations of sendBeacon, and found further evidence that the second limitation only applied to queued requests, not completed requests. So we would only hit this pending state for requests if there are other requests in-flight and the size of their payload in total is over that 64kb limit.

This is where the teammates had a bright idea, we could retry these requests a little bit later if they get stuck in the pending state. Fortunately again, the navigator.sendBeacon api returns a boolean indicating if the browser can send the request, if it is false then that means we have hit either size limitation.

So if we encounter any such request, we can wait a little bit (via setTimeout, requestAnimationFrame, etc), and then retry it.

We opted for a queue, so we can enqueue any pending requests, and otherwise pop them from the queue if they can be sent.

Resolution

In the end, we opted for an implementation that seemed to work quite well.

I've taken the patterns we worked on internally and implemented them within this public NPM package so others don't run into the same issue as we did: better-beacon. The code for the library is available on my GitHub as well.