The proof that only runs in a window

Process 13 June 2026 7 min read Tech & tools Updated 22 July 2026

When we shipped the powers that let a god reshape the land, they came with a promise: carve a hill, save, reload, and you get back that hill, down to the last bump. We verify almost everything in The Long Watch with fast automated checks. This was the one promise those checks physically could not reach.

The reshaping powers themselves — the verbs that raise the ground, lower it, soften a slope, or roughen it up — are told from the player’s side. The save architecture they lean on belongs to a sibling post: we don’t store the carved shape, we store the deeds and replay them onto fresh terrain. What I want to follow here is narrower, and to me the most unsettling part of the whole feature: the promise we cared about most lived in the exact spot our verification machinery couldn’t see.

The promise under the powers

Start with what we’re actually claiming. A save records what you did, not what the world looks like afterward. So on reload the land is rebuilt from scratch from its seed, exactly as it first generated, and then your sculpting acts are replayed back on top, in order. The hill you carved returns because we kept the recipe and cooked it again.

For that to hold, a replayed edit has to land identically, bit for bit, today and on the next machine. Roughen is the verb that could betray us: it scatters small bumps across a patch, and a scatter means drawing on randomness. So it draws from the world’s single seeded source of chance rather than ordinary randomness, the mechanic that makes a saved bump come back the same bump. The scatter is a surprise to you; it is fixed to the world.

Carve a hill, save it, reload, and you get back that exact hill: not a near miss, not a similar shape, but the same ground down to the last bump. That is the promise. Now prove it.

The check that can’t run where the checks run

Here is the wall we walked into. Most of our automated checks run windowless: no graphics window opens, nothing is drawn, which is what makes them fast enough to run constantly. But a land edit only takes effect when a real graphics window is actually rendering. Run a sculpt with no window open, and it doesn’t error, doesn’t warn, doesn’t half-apply. It silently does nothing at all.

Sit with what that means for a test. A windowless check of “carve, save, reload, compare” would carve nothing, save nothing, reload nothing, and compare two identical untouched worlds, then pass, cleanly, every time. It would be the most reassuring green checkmark in the whole suite. And it would be verifying absolutely nothing, because the edit it claims to test never happened.

The one promise we most needed to prove was sitting in the precise blind spot our fast checks couldn’t reach. The place the guarantee can actually fail, a replay that doesn’t reproduce the original, is exactly the place a windowless test is blind to, because windowless, there is no original to reproduce in the first place.

The proof that opens a window

So we stopped trying to fit the proof into the fast lane and built it where it actually belongs: in a real, on-screen window, running the live game. The check opens a window, sculpts a genuine patch of a real world by hand, and saves. Then it does the thing that matters: it throws that world away and rebuilds it from the seed, fresh, as if newly made, and replays the logged deeds onto the new ground. Finally it compares the live-sculpted land against the replayed land, region by region, point by point, hundreds of samples across the affected ground, and demands an exact match. Not close. Exact.

One more turn of the screw. A match within a single program run could still be hiding something that only differs between launches: a value that happens to come out the same this time because nothing reset in between. So a companion tool reduces the replayed result to a small, stable signature, and we confirm that signature is identical across two separate launches of the program. Same world, carved the same way, proven to come back the same not just twice in one run but across two cold starts. That is the bar we set for “the hill comes back.”

It is slower than a windowless check. It needs a real machine with a real display. We run it deliberately rather than constantly. And we trust it more than any green checkmark the fast suite could have handed us. For this feature, that fast checkmark would have been a comfortable lie.

A green check from a place the feature can’t run is worth less than no check at all. Having no check makes no promise about the gap; that one would have flattered us about it.

Two things the window flushed out

Building a proof that genuinely walks the whole save-and-reload path did what real exercise always does: it shook loose problems that reasoning alone had walked straight past.

The first was older and quieter than the powers themselves. The record of your deeds was being written faithfully on save, but on load it was read off the file and then never handed to the running game, so every reload started from an empty history. Nothing had ever forced the full round trip before, so nobody had seen it. The replay proof forced it, and the gap surfaced. The full anatomy of that leak, how a power refilled its own tank on every load because of it, is one for another post; what belongs here is only that it stayed invisible right up until something finally ran the path end to end, and was fixed in the same pass.

The second was a small rule we tightened along the way: an edit either fully happens or it doesn’t happen at all. A rejected edit leaves no entry in the log, so the deeds we replay are only ever the deeds that really happened.

What we were honest about leaving rough

The proof pins the part that has to be exact: that a saved-and-reloaded world is identical to the one you shaped. It deliberately doesn’t pin the parts that are still being found by feel. How much energy each verb costs, how wide each brush reaches: those shipped as placeholder values, on purpose, because the right hand-feel for moving the earth isn’t something you can reason your way to on paper. You find it by doing it over and over until lifting a hillside feels the way lifting a hillside should, and then you tune. Pinning numbers we knew we’d change would have been pinning the wrong thing. The exactness we guarded is the one a player is owed: the world remembers exactly what you did to it.

Knowing where the net ends

The lesson travels well past this one feature. We lean on automated checks precisely because they catch what a tired pair of eyes won’t, but they catch it only inside the conditions they run under, and those conditions are not the game. A check that runs without a window is testing a world where sculpting quietly does nothing, and inside that world the promise is trivially true and completely unproven. The same care later ran ahead of the work: two throwaway probes proved the graphics card before either feature was built, each built to skip rather than fake a green from a place it couldn’t see. The same corner came up again once the ground began reshaping itself: whether the coarse copy of the land drawn at a distance hears about a nearby edit was settled by walking out and looking, not by reading the code.

So the real discipline isn’t writing more checks. It’s knowing exactly where your automated coverage ends, which corners of the real running game it can’t reach, and walking those corners by hand, in the conditions the player actually meets. For The Long Watch, the most important corner was the one with the lights on. The proof that the land you shape stays shaped is the proof that only runs in a window. We were glad to open one. You don’t win here. You tend — and the world keeps exactly the hill you made.