Skip to content

devlog

What the unit tests caught

2026-05-18

The test suite is 98 pure-logic checks, about five seconds wall-clock end to end. Most just pin behaviour that's already correct. Two of them found bugs nobody had noticed.

There's a story you tell yourself when you test code you already trust: this is preventive, the code works, the tests just guard against later regressions. That was true for ninety-something of the 98 added this week. The other two found bugs that had been in production all along.

The first was rounding. The capture script derives a hero NPC's box-collider half-extents from its splat-scene AABB, rounded to three decimals. The test built an AABB whose half-extent landed exactly on 2.5315 — a number you'd round up to 2.532 by the half-up convention every schoolchild learns. JavaScript doesn't. Number((2.5315).toFixed(3)) gives 2.531. The reason is floating point: 2.5315 stored in a double is actually 2.531499…, and toFixed rounds that toward zero. Every collider built on a value with that exact representation has been a millimetre smaller than I thought. I pinned the real behaviour with a guardrail comment so a future swap to a half-up rounder doesn't silently grow every collider by 1 mm and re-trigger the physics tuning.

The second was the locale. A debug HUD reads [80,500 splats / 1,024 KB / 234 ms]. The format string runs through Number.prototype.toLocaleString(), which respects the runtime locale. My assertion expected "1,234,567" — en-US grouping. The browser returned "12,34,567" — lakh grouping, the Indian separator. Both are correct for their locale and both look wrong in the other context. My dev box is en-IN; Cloudflare serves whatever locale loads the page. Any number on a polished surface (the build chip, marketing copy, the press kit) has to pin 'en-US'. Internal tools can follow the visitor; an investor-screenshot surface can't. The test now matches the separator with a regex instead of pinning a style, so a CI runner in a third locale doesn't flake the suite.

Neither bug is bad. A 1 mm collider error doesn't matter for a chair you drive past at 30 mph, and a press figure with the wrong separator wouldn't survive to a v1. But the pattern is what stuck with me: both were in code that worked, that had been written carefully, that humans had read without noticing the wrong thing.

That's the part of unit testing nobody mentions in the coverage-percentage argument. You write the assertion, it fails, and then you look closely at the output for the first time. The looking is where the bug surfaces. The test is a forcing function for paying attention, not a separate thing that mechanically checks the first thing.

The ninety-six that caught nothing aren't wasted, either. They're the price of the two that did. You can't write the locale test in isolation; you write a battery and one of them happens to be the one that surfaces. The hit rate stops mattering once the floor is "every test runs in milliseconds." 98 tests, 5.1 seconds.