The Settings API That Corrupted Every Setting at Once

It was supposed to be a five-minute job.

A client’s backup server needed one field bumped on a handful of backup clients. Retention. One value. I had a Python wrapper sitting right there with a method named exactly what I wanted: change_client_setting(). Tab-complete practically begged me to use it.

So I looped over the clients and called it. Twice, because I tweaked a second field on the second pass.

Then the dashboard started showing clients with no backup schedule, no retention, no paths — like they’d never been configured. Not one client. All of them.

The investigation

First instinct: did I fat-finger the loop? No. The script was boring. Read a name, set a field, move on.

Second instinct: did the service choke? Logs were clean. The server was happily applying the garbage I’d handed it.

So I pulled the raw settings blob the API was actually POSTing. And there it was — the settings object, wrapped inside another settings object, wrapped inside another one. A JSON turducken.

┌─────────────────────────────────────────────┐
│ POST #1   { retention: 30, schedule: ... }   │  ✓ looks fine
├─────────────────────────────────────────────┤
│ POST #2   { settings: {                      │
│              settings: {                      │
│                retention: 30, schedule: ...   │  ✗ nested once
│              } } }                            │
├─────────────────────────────────────────────┤
│ POST #3   { settings: { settings: {          │
│              settings: { ...buried... } } } } │  ✗ nested twice → unreadable
└─────────────────────────────────────────────┘
        server reads top level → finds nothing it recognizes
        every real field is now three layers down → effectively gone

The “aha”

Here’s the part that made my stomach drop.

change_client_setting() does not set one field. It round-trips the entire settings object: GET the whole blob, mutate the one key in memory, then re-serialize and POST the whole thing back.

That’s already more blast radius than the name implies. But the killer was the encoder. Its re-serialization had a recursive-nesting bug — each round-trip tacked the existing object inside a fresh settings wrapper instead of replacing it.

One call: subtly wrong but survivable. Two calls: the real fields are buried deep enough that the server can’t find them anymore. The defaults win. Every client looks wiped.

I didn’t corrupt one setting. I corrupted the container every setting lives in — for every client — by changing a single number.

The fix

Stop trusting the convenience method. Write a minimal-POST helper that submits only the field that changed, never the whole object — so there’s no full blob to re-encode and no encoder to trip over.

def safe_set_client_setting(server, client_id, key, value, dry_run=False):
    # Snapshot the current settings BEFORE touching anything.
    current = server.get_client_settings(client_id)
    snapshot_path = f"snapshots/{client_id}-{int(time.time())}.json"
    with open(snapshot_path, "w") as f:
        json.dump(current, f, indent=2)

    if dry_run:
        print(f"[dry-run] {client_id}: {key} {current.get(key)!r} -> {value!r}")
        return

    # Minimal POST: ONLY the one field. No round-trip of the whole blob.
    server.post_setting(client_id, {key: value})

    # Read-back validation. Trust nothing.
    after = server.get_client_settings(client_id)
    if str(after.get(key)) != str(value):
        raise RuntimeError(
            f"VALIDATION FAILED {client_id}: {key} is {after.get(key)!r}, "
            f"expected {value!r} — restore from {snapshot_path}")
    print(f"✓ {client_id}: {key} = {value}  (snapshot {snapshot_path})")

Recovery itself was unglamorous — restore each client from the JSON snapshot I now wish I’d had before the first run:

for f in snapshots/*-pre-incident.json; do
  id=$(basename "$f" | cut -d- -f1)
  python restore_settings.py --client "$id" --from "$f"
done

Four guardrails came out of this, and they’re now non-negotiable on that box:

safe_set_client_setting()snapshotsave .jsondry-run?preview diffminimal POSTone field onlyread-backverify ✓

old path: GET whole blob → re-encode → POST whole blobnew path: never hand the encoder the full object

Snapshot → dry-run → minimal write → read-back. Every mutation, every time.

Why it happened

The method name lied — or rather, I read a name and assumed a behavior. “Change one client setting” sounded surgical. Under the hood it was a full read-modify-write of a complex object, handed to a serializer nobody had stress-tested across repeated calls.

A buggy encoder on a single-field write is annoying. A buggy encoder on a whole-object round-trip is a data-integrity bomb, because every write re-touches every field. The bug didn’t need to corrupt the value I changed. It corrupted the structure wrapping everything.

And I had no snapshot, no read-back, and no dry-run — so the first time I learned any of this was when the dashboard went blank.

Takeaways

  • Read what the API actually sends, not what the method is named. “Set one field” frequently means “GET, mutate, re-serialize, POST everything.” That round-trip is your real blast radius.
  • Prefer minimal, targeted writes. If you can POST just the changed key, do it — you never hand a fragile encoder the whole object to mangle.
  • Validate by reading back. A write that “succeeded” tells you the server accepted bytes, not that the right value landed. Read it again and compare.
  • Snapshot before you mutate. A pre-write JSON dump turns a catastrophe into a one-line restore. It costs milliseconds and buys you the whole night back.
  • Ship a --dry-run for anything that touches production config. Seeing the diff before it’s real would have caught the nesting on call number one.