The Settings API That Corrupted Every Setting at Once

It was supposed to be a five-minute job.

A client’s backup server needed one field bumped on a handful of backup clients. Retention. One value. I had a Python wrapper sitting right there with a method named exactly what I wanted: change_client_setting(). Tab-complete practically begged me to use it.

So I looped over the clients and called it. Twice, because I tweaked a second field on the second pass.

Then the dashboard started showing clients with no backup schedule, no retention, no paths — like they’d never been configured. Not one client. All of them.

The investigation

First instinct: did I fat-finger the loop? No. The script was boring. Read a name, set a field, move on.

Second instinct: did the service choke? Logs were clean. The server was happily applying the garbage I’d handed it.

So I pulled the raw settings blob the API was actually POSTing. And there it was — the settings object, wrapped inside another settings object, wrapped inside another one. A JSON turducken.

┌─────────────────────────────────────────────┐
│ POST #1   { retention: 30, schedule: ... }   │  ✓ looks fine
├─────────────────────────────────────────────┤
│ POST #2   { settings: {                      │
│              settings: {                      │
│                retention: 30, schedule: ...   │  ✗ nested once
│              } } }                            │
├─────────────────────────────────────────────┤
│ POST #3   { settings: { settings: {          │
│              settings: { ...buried... } } } } │  ✗ nested twice → unreadable
└─────────────────────────────────────────────┘
                       ▼
        server reads top level → finds nothing it recognizes
        every real field is now three layers down → effectively gone

The “aha”

Here’s the part that made my stomach drop.

change_client_setting() does not set one field. It round-trips the entire settings object: GET the whole blob, mutate the one key in memory, then re-serialize and POST the whole thing back.

That’s already more blast radius than the name implies. But the killer was the encoder. Its re-serialization had a recursive-nesting bug — each round-trip tacked the existing object inside a fresh settings wrapper instead of replacing it.

One call: subtly wrong but survivable. Two calls: the real fields are buried deep enough that the server can’t find them anymore. The defaults win. Every client looks wiped.

I didn’t corrupt one setting. I corrupted the container every setting lives in — for every client — by changing a single number.

The fix

Stop trusting the convenience method. Write a minimal-POST helper that submits only the field that changed, never the whole object — so there’s no full blob to re-encode and no encoder to trip over.

def safe_set_client_setting(server, client_id, key, value, dry_run=False):
    # Snapshot the current settings BEFORE touching anything.
    current = server.get_client_settings(client_id)
    snapshot_path = f"snapshots/{client_id}-{int(time.time())}.json"
    with open(snapshot_path, "w") as f:
        json.dump(current, f, indent=2)

    if dry_run:
        print(f"[dry-run] {client_id}: {key} {current.get(key)!r} -> {value!r}")
        return

    # Minimal POST: ONLY the one field. No round-trip of the whole blob.
    server.post_setting(client_id, {key: value})

    # Read-back validation. Trust nothing.
    after = server.get_client_settings(client_id)
    if str(after.get(key)) != str(value):
        raise RuntimeError(
            f"VALIDATION FAILED {client_id}: {key} is {after.get(key)!r}, "
            f"expected {value!r} — restore from {snapshot_path}")
    print(f"✓ {client_id}: {key} = {value}  (snapshot {snapshot_path})")

Recovery itself was unglamorous — restore each client from the JSON snapshot I now wish I’d had before the first run:

for f in snapshots/*-pre-incident.json; do
  id=$(basename "$f" | cut -d- -f1)
  python restore_settings.py --client "$id" --from "$f"
done

Four guardrails came out of this, and they’re now non-negotiable on that box:

old path: GET whole blob → re-encode → POST whole blobnew path: never hand the encoder the full object

Snapshot → dry-run → minimal write → read-back. Every mutation, every time.

Why it happened

The method name lied — or rather, I read a name and assumed a behavior. “Change one client setting” sounded surgical. Under the hood it was a full read-modify-write of a complex object, handed to a serializer nobody had stress-tested across repeated calls.

A buggy encoder on a single-field write is annoying. A buggy encoder on a whole-object round-trip is a data-integrity bomb, because every write re-touches every field. The bug didn’t need to corrupt the value I changed. It corrupted the structure wrapping everything.

And I had no snapshot, no read-back, and no dry-run — so the first time I learned any of this was when the dashboard went blank.

Takeaways

Read what the API actually sends, not what the method is named. “Set one field” frequently means “GET, mutate, re-serialize, POST everything.” That round-trip is your real blast radius.
Prefer minimal, targeted writes. If you can POST just the changed key, do it — you never hand a fragile encoder the whole object to mangle.
Validate by reading back. A write that “succeeded” tells you the server accepted bytes, not that the right value landed. Read it again and compare.
Snapshot before you mutate. A pre-write JSON dump turns a catastrophe into a one-line restore. It costs milliseconds and buys you the whole night back.
Ship a --dry-run for anything that touches production config. Seeing the diff before it’s real would have caught the nesting on call number one.

The investigation#

The “aha”#

The fix#

Why it happened#

Takeaways#

The investigation

The “aha”

The fix

Why it happened

Takeaways