The API Loop That Ran Up the Bill Overnight

The cron job ran at 11 PM. By the time I poured coffee the next morning, it had called a paid LLM API a few thousand times.

Nobody touched it. Nobody approved it. It just sat there in the dark, retrying, retrying, retrying — like a vending machine eating a stuck dollar bill, except the dollar bill was billable tokens.

I didn’t find it because of a clever alert. I found it because the provider dashboard had a graph that looked like a cliff face.

The scene

I run a small fleet of unattended automations for a client. One of them enriches records overnight by sending them through a metered LLM endpoint. Cheap per call. Boring. The kind of job you set up once and forget — which is exactly the problem.

That night, the upstream service it depended on got flaky. A handful of calls returned errors. The loop did what badly-written loops always do.

It tried again. Immediately. Forever.

┌────────────────────────────────────────────────────┐
│  THE LOOP (as written)                             │
│                                                    │
│   ┌──────────┐   error   ┌──────────────┐          │
│   │ call API │ ────────► │ retry now    │          │
│   └────┬─────┘           └──────┬───────┘          │
│        ▲                        │                  │
│        └────────────────────────┘                  │
│        no backoff · no cap · no kill switch        │
│                                                    │
│   result: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  thousands of hits │
└────────────────────────────────────────────────────┘

No sleep. No max-retry counter. No circuit breaker. Every failure became an instant request, and every instant request became another line on the invoice.

The investigation

First thing: confirm it’s us and not a stolen key. I pulled the provider’s usage view and bucketed by hour.

# usage exploded between 23:00 and 07:00 — exactly the cron window
# logs from the job host told the same story
journalctl -u record-enrich.service --since "yesterday 23:00" \
  | grep -c "POST /v1/"
# -> 4,300-something. Overnight. For a job that should make ~80 calls.

Same source IP. Same user-agent. Same service. Not a breach — a self-inflicted wound. Somehow that’s worse, because it means the call was coming from inside the house.

The “aha”

The smoking gun was four lines of code. Paraphrased:

while not done:
    try:
        resp = client.complete(payload)
        done = True
    except Exception:
        continue   # <-- the whole disaster, right here

continue. No delay. No ceiling. The instant the upstream hiccuped, this turned into a tight spin loop firing paid requests as fast as the network would carry them. A retry without backoff isn’t resilience — it’s a denial-of-wallet attack you launch against yourself.

Cost climbs unattended all night; a provider-side quota cap is the only thing that would have flattened it before morning.

The fix

I treated the key as compromised even though it wasn’t, because the behavior was indistinguishable from a leak. Containment first, blame later.

# 1. Rotate the key immediately — old one dies on the spot
provider keys rotate --name record-enrich --revoke-old

# 2. Disable the API entirely at the provider while I fix the code.
#    Kill the bleeding before patching the artery.
provider api disable --service record-enrich

# 3. Re-enable WITH a hard cap + budget alert. The cap is the real fix.
provider budget set --service record-enrich \
  --hard-limit-usd 25 --period monthly
provider alerts set --service record-enrich \
  --notify-at 50% --notify-at 90% --channel telegram

Then the code got the guardrails it should have shipped with:

import time

MAX_RETRIES = 5

for attempt in range(MAX_RETRIES):
    try:
        resp = client.complete(payload)
        break
    except TransientError:
        time.sleep(min(2 ** attempt, 30))   # exponential backoff, capped
else:
    raise RuntimeError("gave up after retries — failing loud, not looping")

Backoff. A retry ceiling. And a loud failure instead of a silent infinite spin. The provider cap is the seatbelt; this is actually steering the car.

Why it happened

It happened because “it’s just a small overnight job” is the exact mindset that ships uncapped loops at paid endpoints. The cost-per-call was trivial, so nobody did the multiplication. Trivial times infinity is still a number you have to pay.

The code had no concept of “too many.” The provider had no concept of “enough.” With neither a ceiling in the app nor a ceiling at the wallet, the only limiter left was how fast the network could move — and that’s a throttle on the damage, not a budget.

Takeaways

Put a hard spend cap at the provider. It’s the only limit a runaway loop can’t out-code. Set it before you write the first request.
Backoff and a max-retry count are not optional. A retry without delay or ceiling is a self-DoS. continue on an exception is a loaded gun.
Wire usage alerts at 50% and 90%. You want a phone buzz at midnight, not a graph shaped like a cliff at breakfast.
Treat weird usage as a leak until proven otherwise. Rotate the key and disable the endpoint first; debug the code second. Containment beats curiosity.
“Small overnight job” is a smell. Anything unattended that touches a metered API gets a kill switch, a cap, and an alert — or it doesn’t get deployed.

The scene#

The investigation#

The “aha”#

The fix#

Why it happened#

Takeaways#