Prefetch Changed Everything

I run an AI assistant called Triss on OpenClaw. One of her main jobs is email triage — she monitors multiple M365 tenants and Gmail accounts on a cron schedule, decides what’s worth my attention, and delivers a summary to Telegram. VIP contacts always surface. Financial alerts always surface. Security items always surface. Routine noise gets suppressed.

The triage logic worked well from the start. The data pipeline was a mess.

The first version

The original setup let the model handle everything: data retrieval, parsing, triage, and delivery. An OpenClaw cron job fired every 30 minutes, the model called mcporter (an MCP bridge to Microsoft Graph) to fetch emails, parsed the responses, applied triage rules, and sent the results to Telegram.

It worked — technically. But mcporter was returning full HTML email bodies. Every email came back with headers, styling, quoted reply chains, and “Caution: This email originated from outside the organization” banners. The model was ingesting all of it.

The numbers told the story: 1.45 million tokens across 11 LLM calls per monitoring cycle. For checking email.

And the JSON parsing broke constantly. The HTML bodies contained characters that corrupted the JSON structure. So I started patching: a robust_json_load wrapper, regex fallback parsing, body truncation at 500 characters. Classic symptom-chasing. Every fix worked until it didn’t, and then I’d add another layer of duct tape.

The fix wasn’t better parsing

The problem wasn’t that the model was bad at parsing messy data. The problem was that the model was touching the data at all.

I moved all data collection into a Python prefetch script that runs on system cron before the model ever wakes up. The model never calls an API. It never parses raw HTML. It reads a clean JSON file and triages.

The script does four things:

1. Requests only the fields it needs. Instead of fetching full email objects, mcporter gets called with --select to request only subject, from, receivedDateTime, isRead, and bodyPreview. That last one is important — bodyPreview is a plain-text excerpt that Microsoft provides specifically so you don’t have to parse the full body. It’s right there in the API and I was ignoring it.

2. Returns clean JSON. The --output text flag tells mcporter to return plain JSON instead of wrapping it in an MCP protocol envelope. No more envelope-stripping step.

3. Filters by time window. The script calculates a lookback window — 1 hour for the regular 30-minute monitor, 12 hours for the morning briefing that needs to catch overnight email. Only recent messages come back. No stale data from three days ago showing up because the API returned “most recent 10” without a date filter.

4. Writes compact JSON to a temp file. The output lands at /tmp/openclaw/email-monitor-pending.json. When the OpenClaw cron job fires 2 minutes later, the model reads this file. That’s all it reads.

The cron schedule

This is the part people skip and it matters. The prefetch and the model run can’t fire at the same time, and neither should collide with other jobs.

:05  — prefetch script runs (Python, zero AI tokens)
:07  — OpenClaw cron fires, model reads the JSON and triages
:33  — next news digest (staggered away from email cycle)

The 2-minute gap between prefetch and model isn’t arbitrary. It’s enough time for the script to finish and write the file, with margin for slow API responses. I learned the hard way that cron jobs competing for the same API in the same minute window creates a feedback loop that burns tokens and money.

Before and after

Metric	Before (model-does-everything)	After (prefetch)
Tokens per cycle	~1,450,000	~14,000
LLM calls per cycle	11	1
JSON parsing errors	Frequent	Zero
HTML body processing	Every email, every run	Never
Reduction	—	99.1%

The model went from building its own context from raw API calls to reasoning over a pre-built state file. Same triage quality. Fraction of the cost.

What I’d tell someone building this

If you’re building an AI agent that runs on a schedule and processes external data, separate the collection layer from the reasoning layer. It’s the same principle as not letting your web app query the database without an abstraction layer — except with AI, the penalty for skipping it is measured in dollars and token burn instead of just latency.

The model is good at reasoning. Let it do that. Don’t make it also be your data pipeline.

I didn’t arrive at this from a design principles document. I got there by watching my agent choke on raw HTML email bodies at 7 AM while I was trying to figure out why the morning briefing was showing emails from three days ago.