
It was supposed to be a five-minute job.
A client’s backup server needed one field bumped on a handful of backup clients. Retention. One value. I had a Python wrapper sitting right there with a method named exactly what I wanted: change_client_setting(). Tab-complete practically begged me to use it.
So I looped over the clients and called it. Twice, because I tweaked a second field on the second pass.
Then the dashboard started showing clients with no backup schedule, no retention, no paths — like they’d never been configured. Not one client. All of them.
The investigation
First instinct: did I fat-finger the loop? No. The script was boring. Read a name, set a field, move on.
Second instinct: did the service choke? Logs were clean. The server was happily applying the garbage I’d handed it.
So I pulled the raw settings blob the API was actually POSTing. And there it was — the settings object, wrapped inside another settings object, wrapped inside another one. A JSON turducken.
┌─────────────────────────────────────────────┐
│ POST #1 { retention: 30, schedule: ... } │ ✓ looks fine
├─────────────────────────────────────────────┤
│ POST #2 { settings: { │
│ settings: { │
│ retention: 30, schedule: ... │ ✗ nested once
│ } } } │
├─────────────────────────────────────────────┤
│ POST #3 { settings: { settings: { │
│ settings: { ...buried... } } } } │ ✗ nested twice → unreadable
└─────────────────────────────────────────────┘
▼
server reads top level → finds nothing it recognizes
every real field is now three layers down → effectively gone
The “aha”
Here’s the part that made my stomach drop.
change_client_setting() does not set one field. It round-trips the entire settings object: GET the whole blob, mutate the one key in memory, then re-serialize and POST the whole thing back.
That’s already more blast radius than the name implies. But the killer was the encoder. Its re-serialization had a recursive-nesting bug — each round-trip tacked the existing object inside a fresh settings wrapper instead of replacing it.
One call: subtly wrong but survivable. Two calls: the real fields are buried deep enough that the server can’t find them anymore. The defaults win. Every client looks wiped.
I didn’t corrupt one setting. I corrupted the container every setting lives in — for every client — by changing a single number.
The fix
Stop trusting the convenience method. Write a minimal-POST helper that submits only the field that changed, never the whole object — so there’s no full blob to re-encode and no encoder to trip over.
def safe_set_client_setting(server, client_id, key, value, dry_run=False):
# Snapshot the current settings BEFORE touching anything.
current = server.get_client_settings(client_id)
snapshot_path = f"snapshots/{client_id}-{int(time.time())}.json"
with open(snapshot_path, "w") as f:
json.dump(current, f, indent=2)
if dry_run:
print(f"[dry-run] {client_id}: {key} {current.get(key)!r} -> {value!r}")
return
# Minimal POST: ONLY the one field. No round-trip of the whole blob.
server.post_setting(client_id, {key: value})
# Read-back validation. Trust nothing.
after = server.get_client_settings(client_id)
if str(after.get(key)) != str(value):
raise RuntimeError(
f"VALIDATION FAILED {client_id}: {key} is {after.get(key)!r}, "
f"expected {value!r} — restore from {snapshot_path}")
print(f"✓ {client_id}: {key} = {value} (snapshot {snapshot_path})")
Recovery itself was unglamorous — restore each client from the JSON snapshot I now wish I’d had before the first run:
for f in snapshots/*-pre-incident.json; do
id=$(basename "$f" | cut -d- -f1)
python restore_settings.py --client "$id" --from "$f"
done
Four guardrails came out of this, and they’re now non-negotiable on that box:
Why it happened
The method name lied — or rather, I read a name and assumed a behavior. “Change one client setting” sounded surgical. Under the hood it was a full read-modify-write of a complex object, handed to a serializer nobody had stress-tested across repeated calls.
A buggy encoder on a single-field write is annoying. A buggy encoder on a whole-object round-trip is a data-integrity bomb, because every write re-touches every field. The bug didn’t need to corrupt the value I changed. It corrupted the structure wrapping everything.
And I had no snapshot, no read-back, and no dry-run — so the first time I learned any of this was when the dashboard went blank.
Takeaways
- Read what the API actually sends, not what the method is named. “Set one field” frequently means “GET, mutate, re-serialize, POST everything.” That round-trip is your real blast radius.
- Prefer minimal, targeted writes. If you can POST just the changed key, do it — you never hand a fragile encoder the whole object to mangle.
- Validate by reading back. A write that “succeeded” tells you the server accepted bytes, not that the right value landed. Read it again and compare.
- Snapshot before you mutate. A pre-write JSON dump turns a catastrophe into a one-line restore. It costs milliseconds and buys you the whole night back.
- Ship a
--dry-runfor anything that touches production config. Seeing the diff before it’s real would have caught the nesting on call number one.