Trusting a tool before you build on it: prove the floor, then stand on it

Process 13 May 2026 9 min read Tech & tools Updated 5 July 2026

The very first thing The Long Watch did was almost nothing: a bare scene that rendered a patch of voxel world and held a steady frame rate. Small output. But the real product of that opening phase was something else: a list of things we’d assumed about the engine that turned out to be wrong, every one of them caught early because we checked rather than trusted. That habit, prove the floor before you stand on it, became one of the disciplines the whole project leans on hardest.

The rule is plain. Before we build anything substantial on top of an engine feature or a library behaviour, we confirm by hand that it actually does what we think it does. Not “the documentation implies it works this way.” Not “this is how it usually works.” A quick, concrete proof, run against the real build, before any code leans on the answer.

The terrain that proved the design right

The clearest lesson from that first phase was the terrain. The obvious first move was to draw the world at a single, fixed level of detail, every voxel rendered at full crispness, near or far. On a flat test landscape that produced roughly fourteen hundred draw calls and around six hundred thousand triangles, and pushed the per-frame work to about thirty-three milliseconds, well past the budget for smooth motion.

The fix was a terrain that carries several levels of detail at once: full crispness right under the camera, progressively simplified geometry out toward the long horizon that the slow world-overview needs to show. Switching to it cut the draw calls to around eight hundred and held a steady sixty frames a second. What’s worth noticing is that the original design notes had called for exactly that kind of layered-detail terrain all along. The shortcut — one fixed detail level everywhere — was the tempting thing; the measurement is what proved the design right and the shortcut wrong.

An aerial view of a voxel landscape at golden hour, detailed in the foreground and softening into a hazy distant horizon. — Sharp up close, simplified toward the horizon: the layered terrain the design called for all along.

The design calls for layered-detail terrain. Trust the design — then measure, and let the number settle the argument.

A catalogue of small wrong assumptions

Alongside the terrain came a string of smaller traps, each one cheap to catch early and expensive to discover late. None of them were dramatic. That’s the point: they were the kind of easy-to-miss mismatch you’d sail straight past if you trusted your first instinct.

The voxel library’s settings carry slightly different names than the engine’s built-in equivalents, easy to fumble in a fast read, and the kind of thing that fails silently rather than loudly.
One noise generator the documentation implied was available just wasn’t present in our build. A tiny throwaway probe confirmed its absence in seconds, rather than us assuming it was there and building on a thing that didn’t exist.
Certain heavy operations only run inside a real windowed session, never in a fast headless check, so a check that “passes” headless can be meaningless for them.
A screenshot read-back briefly drops the frame rate, so you mustn’t measure performance across it. And some objects error if you point them at a target before they’re placed in the scene. Each of those cost us a couple of failed runs to pin down: once, and then never again, because we wrote the finding down.

The note we left for the next phase was blunt: the verify-first habit is genuinely load-bearing, and it should not be watered down. We kept it the way the development notes are kept: plainly, as notes to ourselves.

Two buckets, and a probe to move between them

Out of that the habit hardened into something concrete: we keep engine and library behaviours in two explicit buckets. One list of things we’ve proven work. A separate list of things that look useful but we haven’t confirmed yet. You may freely reach for anything in the first bucket. The moment you reach for something in the second, you owe a quick proof first.

The instrument for paying that debt is the throwaway probe: a tiny experiment written for one purpose, to answer a single question about how the engine behaves, then run, read, and thrown away. What survives isn’t the probe; it’s the one-line finding it produced, moved from the unconfirmed bucket to the proven one. The graphics card got the same treatment: a pair of probes proved it could draw a dense crowd and see-through water before either feature was built on it.

A clean example. We wanted to copy a shared settings object and override one field on the copy, locally, without disturbing the original everyone else uses. That behaviour wasn’t in the proven bucket, so we had no record of whether the override would leak back into the shared original. Rather than assume, we wrote a quick experiment, confirmed the override stayed local, recorded it as trusted, and only then built on it. The same caution once went the other way: an alternative rendering approach sat in the unconfirmed bucket, so we fell back to the known-safe default and deferred the unproven one until it could earn its place. The probe is a go/no-go gate, and “no” is an acceptable answer.

The screen we de-risked before we built it

The first place that habit ran as a deliberate opening move was the world-preview screen: the slow orbiting flyover you watch before committing to a world. What that screen feels like to a player is a separate post; this one is about how we made sure the ground under it was solid before we put any weight on it.

Before writing a single line of the screen’s real code, we authored a set of disposable probes, each one written, run, and thrown away, to test every risky engine behaviour the screen would depend on. Two findings mattered most.

First, a probe proved that a whole world can be torn down and rebuilt in the middle of the camera’s orbit, cleanly, with the frame rate recovering to sixty. That single result decided the design of the re-roll button: because the tear-down-and-rebuild was proven clean, re-roll needed no special “still building, please wait” guard wrapped around it. A feature got simpler because a probe removed a fear.

Second, a probe proved that when you save a world and immediately hand off to the next screen, the save is fully written to disk before that next screen tries to read it. Without that proof we’d have had to defend against a gap where the hand-off sees no save yet: a whole class of defensive code we got to not write, because we’d shown the gap didn’t exist.

A small voxel world seen from a slowly circling overhead camera at golden hour, terrain and biomes turning under a low sun. — The slow orbit a player watches before settling in: every behaviour beneath it proven before a line of it was built.

Other probes confirmed the smaller things: that the scene hand-off and the camera takeover happen in a clean order, and that the orbit-camera math runs without churning memory every frame. Every engine assumption the screen rested on held exactly as the plan presumed.

Only then did we build the real thing on top of that proven floor: the slow flyover, the re-roll, the rest of what a player actually sees. It shipped working on the first pass, because every surface the finished screen stood on had already been proven solid before any weight landed on it.

The same instinct, pointed at the build itself

The habit doesn’t stop at engine APIs. It extends to a subtler trap, and a real one: a green check is not the same as a correct result. Code can pass every stability check while being silently wrong: as long as it’s wrong in the same way every time, a consistently-wrong value is still consistent, and a check that asserts “same as last run” sails right past it. The fullest version of that lesson, the family of checks that pass while verifying nothing, is told on its own; here it’s enough to say that the antidote was the same one as everywhere else. Don’t trust the tool to be doing the job you assume it’s doing. Prove it.

And it extends to information, too. We treat any figure or conclusion that arrives across a boundary as unverified until it’s re-checked against the live project: a number remembered from earlier, a result handed off from a previous step, a claim asserted in passing. More than once a stale count, or an inherited “this is already done,” was caught simply by re-measuring rather than trusting. Verifying that a saved world reloads byte-for-byte the way you left it has its own dedicated machinery; the broader rule that fathers all of it is the plain one this post is about.

Why a quiet game needs a loud discipline

The Long Watch is a single-player world meant to feel alive and consistent over a very long time: plants, soil, weather, and creatures all feeding back into one another, and a saved world that must come back exactly as you left it. A subtle wrong-but-stable bug, or a feature built on an engine behaviour that turns out to work differently than assumed, wouldn’t announce itself with a crash. It would quietly corrupt the simulation’s honesty — the slowest, hardest kind of failure to ever notice.

That’s why a calm, unhurried game carries an almost stubborn engineering discipline underneath it. The whole point of the game is that you can trust the world to keep its promises across all the evenings you spend in it. We can only ask players to trust the world because we refused to trust the floor without checking it first.

Prove the floor, then stand on it. Everything The Long Watch asks you to trust is built on something we made sure of by hand.