"It works" is where the risk starts. Most automations don't fail loudly; they fail quietly, three weeks after handoff, on the client's busiest day. These are the 10 checks we run before any workflow touches a client's business. All of them, free, right here.
The failure it prevents: Two invoices, two welcome emails, two CRM records.
The same event will eventually arrive twice: webhooks retry, users double-click, syncs overlap. Build a dedupe key from the event's natural identity (order ID, email + date) and check it before doing anything with side effects.
The failure it prevents: The customer who never got a reply, and nobody knows.
Route failed runs to a visible place: a "needs attention" board, a sheet, a channel. The volume tells you the system's health; the contents tell you who to call back.
The failure it prevents: A breakage discovered by the client instead of by you.
Logging is not alerting. When something breaks, a specific person gets a message in a channel they already check. Reserve alerts for failures that need a decision, or everything alerts and nothing does.
The failure it prevents: The wrong message sent in the client's name.
Emails, quotes and replies stop in a review queue until a person approves. This is not a limitation; it's the feature that lets a cautious client say yes. Widen the gate later, once trust is earned in both directions.
The failure it prevents: One flaky API call killing an entire run.
Every external call fails sometimes: rate limits, timeouts, deploys. Retry with increasing delays, respect rate limits, and only after real persistence route to the dead-letter path.
The failure it prevents: Shipping the sad path untested.
Test with the data you'll actually receive: empty fields, names with emoji, phone numbers with spaces, dates in the wrong format, a postcode where the city should be. Real-world forms produce all of it in week one.
The failure it prevents: A client who feels they don't control their own system.
One switch, clearly labelled, that pauses everything safely without calling you. It almost never gets pressed. Its existence is the point.
The failure it prevents: Reminders at 3am and follow-ups on the client's Sunday.
"Send at 9am" means nothing until you say whose 9am. Schedules run on the client's business day, not your server's UTC. Invisible in testing, permanent in production.
The failure it prevents: An afternoon of archaeology when a lead slips through.
Every run should answer: what ran, on what input, what changed. If your logs can't answer those in two minutes, they're decoration.
The failure it prevents: The automation nobody remembers how it works.
Every workflow needs a person who owns it: who gets the weekly health summary, who knows what it does, who decides when it changes. An automation without an owner is technical debt with a delay on it. This is the check everyone skips.