Homelab on RevoluGame

Designing Single-Purpose Agents Instead of One Big Automation Script

Wed, 17 Jun 2026 10:00:00 +0100

“Agent” has become one of those words that means everything and nothing this year. Before it was a hype term, I’d already ended up with a small flock of them in my homelab. Not because I was chasing a trend, but because I kept hitting the same wall every time I tried to write One Big Script: it grew a dozen unrelated responsibilities, and a bug in one of them risked taking down all of them.

So instead, every recurring chore in my homelab is its own small, independently-scheduled program. There turned out to be more of them than I expected once I actually sat down and counted.

A note for muggles: the repo behind all this is named hogwarts, and every agent gets sorted to match. Once you start naming services after wizards, it turns out you owe each one an in-character job description, whether it asked for one or not.

The standing watch. Four observers poll continuously and report into one correlator every five minutes. This is the layer that exists so I find out about a problem before it becomes a 3am page instead of after:

Argus Filch watches running Docker containers for restart loops, failed healthchecks, and containers that just quietly vanish.
Astronomy Tower polls Prometheus for firing alerts, down scrape targets, and recording rules that stopped working without telling anyone.
Marauder’s Map scans the UniFi network for offline devices, WAN failover events, and firewall rules that drifted open.
Mad-Eye’s Watch tracks TLS certificate expiry across configured endpoints. Constant vigilance: a warning at 30 days, a critical at 7.
The Headmaster is the one role on this list that isn’t single-purpose by design. Its entire job is reading what the other four decided was worth reporting and correlating that into one status, surfaced as an incident only when it’s actually worth one.

The daily and weekly chores. These run on their own cron schedules and never talk to each other directly:

Molly’s Cupboard reviews the Home Assistant entity list weekly: unavailable entities, missing or duplicate names, disabled automations. (Molly Weasley: keeps the household running, judges your clutter lovingly.)
Rita’s Desk is the RSS morning digest: feeds in, previous day’s articles out, ranked against persistent tag scores I vote on. Deterministic by design, no LLM in the loop. (Never met a headline she wouldn’t print, but at least she always sources it.)
Kreacher’s Kitchen plans the week’s meals from my recipe library and a couple of trusted cooking sites. (Grumbles the entire time, still gets dinner on the table.)
The Library picks a tech topic every night, gathers sources, and writes a 5-minute digest plus a 15-20 minute deep dive. (Lives in the package manifest as research-digest, but it spends every night in the Restricted Section, so the Library it is.)
Madam Pince’s Catalogue lists every running container and cross-checks it against the service directories in the infra repo, flagging any container that has no matching documentation. (A very particular librarian: every book gets catalogued, or it gets confiscated.)
Dobby’s Rounds is the homelab’s free elf: weekly housekeeping that prunes old snapshots, reports, and state files before they pile up.
O.W.L.s is the daily infrastructure audit: config drift, open ports, compliance. Read-only and deliberately paranoid about it. (Ordinary Wizarding Level exams: thorough, exhausting, and not interested in your excuses.)
Auror Office is the daily cross-domain security digest, correlating O.W.L.s’ findings with auth logs, Docker posture, and the network observers above into one report. (No badge, but it does go looking for dark wizards. I’ve written about how this one and O.W.L.s work together in more detail elsewhere.)
…and others, including media management, recommendations, and a handful more in the same spirit. Small enough that listing every one of them would be its own blog post.

Thirteen-plus names, just as many jobs. Outside of the two correlators built specifically to know about everyone else — the Headmaster and the Auror Office — not one of them needs to care that the rest exist.

The three conventions that make this work

None of these agents are individually clever. What makes the flock manageable is that they all obey the same three small contracts:

1. One artifact format. Every agent writes its result as JSON (and often a companion Markdown note) to its own outbox/latest/ path. A “latest” pointer plus a timestamped archive, every time. No agent reads another agent’s outbox directly. If something needs cross-referencing, that’s a different, explicitly-correlating agent’s job, not an implicit dependency.

2. One notification channel. Every agent that needs to tell me something pushes through the same ntfy topic convention, with a deep link back into wherever the full detail lives. I don’t maintain five different alerting integrations; I maintain one, and every agent is a thin client of it.

3. One aggregation point. A single dashboard reads everyone’s outbox/latest/ and renders it. It doesn’t collect anything itself. It has no Docker access, no Home Assistant credentials, no API keys. It’s a pure read layer over JSON files other things produced. That’s the only place in the whole system that’s allowed to know all the agents exist.

That’s the entire integration surface. Three conventions, and I can add a sixth agent tomorrow without touching the other five.

Why decompose instead of consolidate

The obvious objection: isn’t five small things more to maintain than one big thing? In my experience, no. For the same reason a set of small services usually beats a monolith at work.

A bug in Peeves’ Trakt pagination cannot break Molly’s Home Assistant checks, because they don’t share a process, a deploy, or a schedule. I can test each one in complete isolation with a fixture file instead of live credentials. I can hand any single agent’s directory to a contributor - human or an AI coding agent - and they have everything they need to understand and change it, without first having to load the other four into their head. And when I retire one (Peeves only matters because I still have a media server; that won’t be true forever), deleting it is deleting a directory, not untangling a shared module.

This is the same lesson as service boundaries and team topologies at any reasonably-sized engineering org: the interface between components should be small, explicit, and boring, and almost all of the design effort should go into keeping it that way. Not into making any individual component clever. The cleverness, if there is any, belongs inside one agent’s narrow walls, where it can’t leak.

The boring plumbing is the point

None of the five agents above is doing anything technically hard. RSS parsing, a REST API client, a cron job - this is all stuff any of us could write in an afternoon. The actual design work was deciding, up front, that “outbox JSON + one notification channel + one dashboard” would be the entire contract between them, and then refusing to let any agent reach around it.

That discipline is cheap when you only have one agent. It’s the only thing that keeps five (or fifteen) from turning back into the One Big Script I was trying to avoid in the first place.

Backing Up the One Credential That Can't Be Wrong

Mon, 15 Jun 2026 10:00:00 +0100

Most things in my homelab can fail and I shrug. A container restarts, a dashboard is stale for an hour, a media file gets deleted by mistake: annoying, recoverable, fine. The password vault is not in that category. If it’s wrong, or gone, or merely unreachable at the wrong moment, I lose access to everything else at once. It’s the one piece of infrastructure that earns the extra paranoia.

So instead of “back it up somewhere,” I sat down and asked the question I’d ask at work for any single point of failure: which specific failure does each copy need to survive? That question is what actually shaped the design. Not “more backups.”

Three copies, three different failure modes

The vault lives day-to-day in Dashlane. Around it, a script keeps two more copies current:

A dated, offline KeePass .kdbx archive. The script exports the vault, converts it to KeePass format, and syncs the file to NAS over rsync. This file needs nothing else to be true. No Dashlane account, no Vaultwarden instance, no network to be opened. KeePassXC plus a master passphrase is the entire recovery path.
A live Vaultwarden mirror. The script wipes and re-imports a self-hosted Vaultwarden instance on every run, so it never drifts and never accumulates stale duplicates. Unlike the .kdbx, this one behaves like a normal vault app day-to-day. Useful if Dashlane itself is the thing that’s unavailable.
The original, in Dashlane. Still the primary, still the one I actually use day to day.

Each of these answers a different question. If Dashlane has an outage or I get locked out of the account, Vaultwarden keeps working normally. If my entire home network and every service on it is down, or I’m on a borrowed machine with nothing installed, the .kdbx file plus a passphrase I’ve memorized is the whole recovery procedure. No SSH keys, no app, no account, no network. If the NAS itself dies, the live Vaultwarden mirror (running elsewhere) and Dashlane are both still fine.

That’s the test I’d apply to any redundancy claim: two backups that die from the same root cause aren’t two backups, they’re one backup with extra steps. These three don’t share a single point of failure with each other.

The part that matters more than the architecture

Diagrams of “three copies in three places” are easy to draw and easy to get wrong in the implementation details, and the details are where a vault backup script can quietly turn into a liability instead of a safety net.

The script exports the raw vault to CSV in plaintext before converting it. There’s no way around that, the conversion tool needs plaintext input. So the entire design constraint became: that plaintext must never outlive the script’s own execution. It’s written to a private temp directory, and a shell trap deletes it on every exit path. Success, failure, or interruption, not just the happy path. The master passphrase that protects the .kdbx is treated as its own tier-zero secret: it lives outside the repo, outside the backup, memorized or written down somewhere physically safe, because it’s the one credential that unlocks the credential store.

It’s a small thing, a trap ... EXIT instead of cleanup code only at the bottom of the script. But it’s exactly the kind of detail that’s invisible when it works and catastrophic when it doesn’t. I’d rather the script crash and still clean up after itself than ship a feature and leave a plaintext export sitting in a temp folder because someone hit Ctrl-C at the wrong moment.

Threat-model your one credential that can’t be wrong

Every system has at least one of these. The credential, the key, the record that everything else assumes is correct and available. For most of us it’s a password vault; for some it might be a signing key or a recovery seed.

The exercise worth doing isn’t “add more backups.” It’s listing the ways that one thing could become unavailable or wrong, and checking that your copies don’t all fail for the same reason. If they do, you’ve built redundancy theater, not redundancy.

Running a Personal SOC: Bringing Production Security Practices Home

Fri, 12 Jun 2026 10:00:00 +0100

At work, nobody questions why we have logging, alerting, and a daily look at what changed overnight. At home, the same network runs a NAS, a media stack, Home Assistant, and a handful of containers. And for years my only “security monitoring” was noticing something was broken.

So I built myself a small, read-only security operations setup for the homelab: a daily audit script and a cross-domain digest agent that correlates it with everything else running on the network. Nothing here is novel security research. The interesting part is which production habits turned out to be worth carrying home, and which ones I deliberately left at the office.

Two layers, not one

The setup is split into two pieces with different jobs.

The daily audit is the boring, deterministic layer. Once a day it collects, locally and read-only:

listening sockets, flagging anything bound to a wildcard interface
Docker/container posture: privileged containers, dangerous bind mounts, host network mode, dangerous capabilities
systemd service drift against an expected allowlist
a local secrets/config hygiene scan (path, line, and pattern only - never the matched value)
cached apt list --upgradable, optionally enriched with a trivy fs scan
whether the monitoring artifacts and timers it depends on are actually present

It writes a deterministic Markdown digest and a JSON report to a local outbox. That’s it. No remediation, no service restarts, no firewall changes. The rule I gave it was “assume breach, trust no single signal, change nothing during observation.”

The SOC agent is the correlation layer on top. It runs once a day and pulls together SSH auth logs (brute-force detection, unexpected successful logins, elevated sudo activity), Docker security posture, UniFi network signals (unknown MAC addresses, active IPS/DPI/flood/scan/rogue alarms), and re-surfaces anything security-relevant the daily audit already found. Everything gets a severity from ok up through critical, and the result is written as an Obsidian note with YAML frontmatter. So a year of these becomes a searchable, taggable incident timeline instead of a folder of text files nobody opens.

Where the LLM fits - and where it doesn’t

Both agents can optionally hand their findings to a local Ollama model to write a short narrative summary on top of the facts. This is the part I was most careful about, because it’s the part most people get backwards.

The model never sees raw logs, full inventories, or matched secrets. It only compact finding titles, IDs, severities, and evidence keys. It doesn’t decide what’s a finding; the deterministic analyzers do that. And if Ollama is unreachable or returns something unusable, the deterministic digest ships as-is. The LLM is a narrator, never the source of truth.

That’s the same boundary I’d want on a production alerting pipeline: detection logic stays deterministic and testable, and generative summarization sits strictly downstream of it, never upstream.

The part that’s just operational discipline

A few choices here have nothing to do with security theory and everything to do with habits from running things in production:

Output paths are permissioned, not just “private by convention.” Reports get 0600, runtime directories 0700.
Stale data is treated as a finding, not silently trusted. Upstream reports older than a configured threshold are flagged rather than quietly re-surfaced as current.
Notifications are a tap-through, not the whole story. A push notification carries the headline and a link into the actual note - useful at a glance, but the record of truth lives in the note, not in a chat history that scrolls away.
Everything is replayable offline. Both agents accept a fixture file in place of live collection, so I can test a new analyzer rule against a known input before it ever touches my real network.

What it actually buys me

Mostly, it buys me the same thing it buys at work: I notice drift before it becomes an incident, instead of after. A container that quietly picked up a privileged flag, a port that got published wider than intended, an unfamiliar MAC address on the network. These are exactly the kind of small, boring facts that are easy to miss and easy to detect deterministically.

It’s not a real SOC. There’s no 24/7 coverage, no incident response retainer, and the threat model is “don’t get owned by something dumb,” not “defend against a motivated attacker.” But the muscle is the same one I use at work: write the check once, make it boring and deterministic, let a model help you read the output, and keep a record you can actually search six months later.

If you’re already doing this professionally, the homelab version costs you an evening and pays you back the first time you catch something you would have otherwise missed entirely.