The pager went off at 6:14 a.m. The morning sync hadn’t run.
I knew that script. I wrote that script. It was good. It pulled data, massaged it, pushed it where it needed to go. It worked flawlessly every single time I ran it.
That was the problem. It only worked when I ran it.
The scene
Here’s the embarrassing truth about that automation: it lived inside an SSH session. I’d connect to the box at sync-host.internal.example, fire it off, watch it succeed, and disconnect feeling like a wizard.
The moment my terminal closed, the kernel reaped it. SIGHUP, lights out. No process, no cron-shaped safety net, nothing.
On the Windows box it was worse. I’d “scheduled” it — except every run popped a console window that stole focus from whoever was using the machine. People started killing the window because it interrupted them. Can’t blame them.
So three different failure modes, one script:
┌──────────────────────────────────────────────┐
│ "It works when I run it manually" │
├──────────────────────────────────────────────┤
│ close SSH session ──► SIGHUP, process dies│
│ server reboots ──► nothing comes back │
│ scheduled on win ──► window steals focus │
└──────────────────────────────────────────────┘
│
▼
NOT actually deployed
The investigation
I tailed the journal. Nothing — because nothing was logging anywhere. The script wrote to stdout, and stdout was attached to a terminal that no longer existed.
I checked the process list. Empty. I checked for a cron entry. None — I’d never made one, because “I’ll just run it” had quietly become the deployment strategy.
That’s the aha, and it’s a dumb one: manual execution was masquerading as deployment. The script was perfect and completely undeployed at the same time. Every successful manual run was me hand-holding a service that had no manager.
A long-running, scheduled, headless job has three needs I’d ignored: survive a logout, survive a reboot, and write its logs somewhere a human can read them later. A terminal gives you none of those.
The fix
Give it a service manager. On Linux, a systemd unit. On Windows, NSSM. Both get auto-restart, log redirection, and start-on-boot — and both run headless.
Linux — /etc/systemd/system/morning-sync.service:
[Unit]
Description=Morning data sync
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=svc-sync
WorkingDirectory=/opt/morning-sync
ExecStart=/opt/morning-sync/.venv/bin/python /opt/morning-sync/sync.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/morning-sync/out.log
StandardError=append:/var/log/morning-sync/err.log
[Install]
WantedBy=multi-user.target
sudo mkdir -p /var/log/morning-sync
sudo systemctl daemon-reload
sudo systemctl enable --now morning-sync.service
systemctl status morning-sync.service
journalctl -u morning-sync.service -f
Windows — same job, NSSM does the babysitting and runs it with no visible window:
nssm install MorningSync "C:\apps\morning-sync\.venv\Scripts\python.exe" "C:\apps\morning-sync\sync.py"
nssm set MorningSync AppDirectory "C:\apps\morning-sync"
nssm set MorningSync AppStdout "C:\logs\morning-sync\out.log"
nssm set MorningSync AppStderr "C:\logs\morning-sync\err.log"
nssm set MorningSync AppExit Default Restart
nssm set MorningSync Start SERVICE_AUTO_START
nssm start MorningSync
Now I close my laptop and it keeps running. The box reboots at 3 a.m. for patches and the sync comes back on its own. The logs sit in a file instead of evaporating with my SSH session.
Why it happened
Because the script worked. That’s it. When something works on the first manual run, the brain files it under “done” and quietly skips the boring part — the part where a supervisor owns the process, restarts it when it dies, and persists across boots.
A terminal is not a runtime. Cron-with-a-popup is not headless. “Done” is not “deployed.”
Takeaways
- “It works when I run it manually” is not deployed. A successful manual run proves the logic, not the operation.
- Long-running automation belongs in a service manager — systemd on Linux, NSSM on Windows — not a terminal you forgot to close.
- Always set auto-restart and boot persistence.
Restart=on-failureplusenable --now(orSERVICE_AUTO_START) means reboots and crashes self-heal. - Redirect stdout/stderr to a file. Logs that live inside an SSH session die with the session, and you’ll be debugging blind at 6 a.m.
- Run it headless. Background jobs that pop windows get killed by annoyed users. No UI, no focus theft, no problem.