When Your Agent's Alarm Clock Breaks

It’s 10 PM on a Monday. My phone buzzes. Morning briefing. Three news digests. All at once.

The system had been running fine for weeks. No deploys, no config changes, no restarts. The gateway process was the same PID it had been all day. Nothing changed — and then everything fired nine hours early.

The symptom

I run scheduled jobs on a home server: a morning briefing at 7 AM, news digests staggered through the morning, inbox monitoring every 15 minutes. Standard cron-style scheduling through the framework.

At 10 PM Central, the morning briefing and two news digests all fired simultaneously. Nine hours early. No other jobs misfired. The inbox monitor, which runs every 15 minutes, kept working normally the whole time.

My first instinct was to restart the gateway from Telegram. That stopped the immediate spam — but triggered a separate known bug where the scheduler fires all pending jobs on restart. So now I got the morning briefing again, plus everything else in the queue. Double whammy.

The investigation

SSH’d into the server and started pulling logs.

First thing to establish: did the process crash and restart? If it did, that’s one class of bug. If it didn’t, that’s a much more interesting class of bug.

The systemd journal showed a continuous PID from the last intentional restart through the misfiring event. No crash, no restart, no interruption. The scheduler decided — from a healthy, continuously-running process — that it was time to run the morning jobs.

The nextRunAtMs values on each job were correct. They pointed to 7 AM the next morning. The scheduler had the right target time. It fired them anyway.

The root cause

The misfiring happened at exactly midnight UTC. That’s 6 PM Central during standard time, but we’d just gone through the spring-forward DST transition two days earlier. Now midnight UTC is 7 PM Central — close to the 10 PM firing time once you account for evaluation delays and the specific cron cycle.

The pattern: the scheduler woke up for a legitimately scheduled job (the inbox monitor, which runs every 15 minutes). During that evaluation pass, it checked the timezone-aware morning jobs and incorrectly concluded they were due.

The nine-hour offset doesn’t map to a clean timezone difference. It’s not a simple “forgot to account for DST” bug. It’s subtler — something in how the scheduler re-evaluates timezone-aware cron expressions at UTC date boundaries, specifically after a DST transition has changed the offset between UTC and local time.

The key evidence: it happened at the first cron evaluation after midnight UTC, on the first UTC date rollover since the DST change took effect. The scheduler handles regular daily scheduling fine. It handles the 15-minute jobs fine. But the intersection of timezone-aware daily schedules and UTC date boundaries has an edge case that nobody tested.

The bonus discovery

While investigating, I found something unrelated but arguably worse: a workspace reference file had grown past the platform’s 20,000-character bootstrap injection limit. Everything past 20K was silently truncated. The system had been running with incomplete documentation for an unknown period — no error, no warning, no indication that half its reference material was missing.

Trimmed it from 22K to 14K without losing anything functional. Just removed redundant examples and verbose descriptions. Immediately started seeing correct behavior for things that had been quietly broken.

Silent truncation with no error is a special kind of dangerous. The system looked healthy. Everything was functioning. It just couldn’t see part of its own config.

The fix

Converted all cron schedules from America/Chicago timezone-aware expressions to Etc/UTC with manually computed UTC times. UTC doesn’t have daylight saving time. There are no DST discontinuities, no spring-forward, no fall-back. The entire class of bugs disappears.

The trade-off: twice a year, when DST changes, I need to manually adjust the UTC times so the jobs fire at the right local hour. That’s annoying. But “annoying twice a year” beats “everything fires at 10 PM for reasons I need a forensic investigation to understand.”

Filed a GitHub issue with the full evidence chain. The community confirmed the UTC workaround within hours. No maintainer response yet on the underlying bug.

The lesson

If you’re running scheduled jobs through a framework that supports timezone-aware cron expressions — test them across DST boundaries before you trust them. Specifically, test what happens at the first UTC date rollover after a DST transition. That’s the edge case.

UTC scheduling is boring. Boring is what you want from the thing that decides when your systems wake up.