The proof that only runs in a window

← All field notes

When we shipped the powers that let a god reshape the land, they came with a quiet promise: carve a hill, save, reload, and you get back that hill — down to the last bump. We verify almost everything in The Long Watch with fast automated checks. This was the one promise those checks physically could not reach.

The reshaping powers themselves — raise the ground, lower it, soften a slope, roughen it up — are their own story, told from the player’s side. And the save architecture they lean on — we don’t store the carved shape, we store the deeds and replay them onto fresh terrain — belongs to a sibling post. What I want to follow here is narrower and, to me, the most unsettling part of the whole feature: the promise we cared about most lived in the exact spot our verification machinery couldn’t see.

The promise under the powers

Start with what we’re actually claiming. A save records what you did, not what the world looks like afterward. So on reload the land is rebuilt from scratch from its seed, exactly as it first generated, and then your sculpting acts are replayed back on top, in order. The hill you carved returns because we kept the recipe and cooked it again.

For that to hold, a replayed edit has to land identically — bit for bit, today and on the next machine. Roughen is the verb that could betray us: it scatters small bumps across a patch, and a scatter means drawing on randomness. If it reached for ordinary randomness, the replay would land somewhere slightly different every time and the hill would come back subtly wrong. So roughen draws its scatter from the world’s single seeded source of chance instead — the same well the terrain itself comes from — so replaying the act reproduces the very same bumps. The bumps are a surprise to you; they are fixed to the world.

Carve a hill, save it, reload, and you get back that exact hill — not a near miss, not a similar shape, but the same ground down to the last bump. That is the promise. Now prove it.

The check that can’t run where the checks run

Here is the wall we walked into. Most of our automated checks run windowless — no graphics window opens, nothing is drawn, which is what makes them fast enough to run constantly. But a land edit only takes effect when a real graphics window is actually rendering. Run a sculpt with no window open, and it doesn’t error, doesn’t warn, doesn’t half-apply. It silently does nothing at all.

Sit with what that means for a test. A windowless check of “carve, save, reload, compare” would carve nothing, save nothing, reload nothing, and compare two identical untouched worlds — and pass, cleanly, every time. It would be the most reassuring green checkmark in the whole suite. And it would be verifying absolutely nothing, because the edit it claims to test never happened.

The one promise we most needed to prove was sitting in the precise blind spot our fast checks couldn’t reach. The place the guarantee can actually fail — a replay that doesn’t reproduce the original — is exactly the place a windowless test is blind to, because windowless, there is no original to reproduce in the first place.

The proof that opens a window

So we stopped trying to fit the proof into the fast lane and built it where it actually belongs: in a real, on-screen window, running the live game. The check opens a window, sculpts a genuine patch of a real world by hand, and saves. Then it does the thing that matters — it throws that world away and rebuilds it from the seed, fresh, as if newly made, and replays the logged deeds onto the new ground. Finally it compares the live-sculpted land against the replayed land, region by region, point by point — hundreds of samples across the affected ground — and demands an exact match. Not close. Exact.

One more turn of the screw. A match within a single program run could still be hiding something that only differs between launches — a value that happens to come out the same this time because nothing reset in between. So a companion tool reduces the replayed result to a small, stable signature, and we confirm that signature is identical across two separate launches of the program. Same world, carved the same way, proven to come back the same not just twice in one run but across two cold starts. That is the bar we set for “the hill comes back.”

It is slower than a windowless check. It needs a real machine with a real display. We run it deliberately rather than constantly. And we trust it more than any green checkmark the fast suite could have handed us — because the fast checkmark, for this feature, would have been a comfortable lie.

A green check from a place the feature can’t run is worth less than no check at all. No check is honest about the gap. That one would have flattered us about it.

Two things the window flushed out

Building a proof that genuinely walks the whole save-and-reload path did what real exercise always does: it shook loose problems that reasoning alone had walked straight past.

The first was older and quieter than the powers themselves. The record of your deeds was being written faithfully on save — but on load it was read off the file and then never handed to the running game, so every reload started from an empty history. Nothing had ever forced the full round trip before, so nobody had seen it. The replay proof forced it, and the gap surfaced. The full anatomy of that leak — how a power quietly refilled its own tank on every load because of it — is a sibling’s to tell; what belongs here is only that it stayed invisible right up until something finally ran the path end to end, and was fixed in the same pass.

The second was a small rule we tightened along the way: an edit either fully happens or it doesn’t happen at all. If you reach for ground that can’t be shaped, or you don’t have the energy the act costs, the game does nothing — no half-carved mound, no quiet deduction from your reserves, and crucially no entry in the log. That last part is what keeps the replay honest: a rejected edit that still logged itself would replay on reload as a phantom change to land that never moved. So an edit that can’t land leaves no trace, and the deeds we replay are only ever the deeds that really happened.

What we were honest about leaving rough

The proof pins the part that has to be exact — that a saved-and-reloaded world is identical to the one you shaped. It deliberately doesn’t pin the parts that are still being found by feel. How much energy each verb costs, how wide each brush reaches — those shipped as placeholder values, on purpose, because the right hand-feel for moving the earth isn’t something you can reason your way to on paper. You find it by doing it over and over until lifting a hillside feels the way lifting a hillside should, and then you tune. Pinning numbers we knew we’d change would have been pinning the wrong thing. The exactness we guarded is the one a player is owed: the world remembers exactly what you did to it.


Knowing where the net ends

The lesson travels well past this one feature. We lean on automated checks precisely because they catch what a tired pair of eyes won’t — but they catch it only inside the conditions they run under, and those conditions are not the game. A check that runs without a window is testing a world where sculpting quietly does nothing — and inside that world the promise is trivially true and completely unproven.

So the real discipline isn’t writing more checks. It’s knowing exactly where your automated coverage ends — which corners of the real running game it can’t reach — and walking those corners by hand, in the conditions the player actually meets. For The Long Watch, the most important corner was the one with the lights on. The proof that the land you shape stays shaped is the proof that only runs in a window. We were glad to open one. You don’t win here. You tend — and the world keeps exactly the hill you made.

Keep reading

Concept art · pre‑alpha