I Have Trust Issues With My AI. Canary Comments Help.

I have trust issues.

Even though I have a very detailed harness for my co-working AI with clear instructions, style guide and dos and don'ts, it tends to go off the rails regularly. Whether the context is quite full or was just compacted, these WTF moments pop up quite often:

> Me: Find the default price for a variant

> Agent:
  val variant = product.variants.head
  val price = variant.prices.find(_.isDefault).get
  val amount = price.value.centAmount
  // Return the amount
  amount

> Me: ... .head? .get? Are you serious? WTF?
>     Check your CLAUDE.md.

AI will go off the rails, that is just a given for me. AI also likes to write comments, so canary comments are my current approach to catching AI as early as possible to course correct.

The canary: // why:

I enforce one visible rule for every comment, human or AI:

  • Every comment must start with // why:
  • If there is no real why, the comment should not exist

This gives me a fast drift check during coding or review. If a comment appears without the prefix, I assume instructions are ignored and trust is gone for that change!

// why: headOption + find safely propagate None, no unsafe access
// ARE YOU HAPPY NOW?
for {
  variant <- product.variants.headOption
  price   <- variant.prices.find(_.isDefault)
} yield price.value.centAmount

This is how I instruct it in my CLAUDE.md:

## Comment Convention (canary)

Every comment added by Claude must use a `why:` prefix.
This is a drift canary: if comments appear without it,
the agent has drifted from conventions.

- Only add comments where the intent is not obvious from the code
- Every comment must start with `// why:`
- No decorative comments, section dividers, or docstrings on internal methods
- If you cannot articulate a "why", the comment should not exist

In the spirit of enhancing my work, staying in the flow, I use AI to code alongside me and not autonomously. The comments allow me to immediately spot when something is off. This is visible at first glance! Whenever I review a change and see a comment not adhering to the pattern, it is clear that the instructions are not being followed and this implies all other instructions might be off as well.

That enables me to stop the AI and bring it back on track or take over myself.

As the comments ask for why, this has another benefit of pushing the AI to ask itself why again and might trigger additional reasoning. When forced to write // why: ..., the agent sometimes catches its own mistake mid-comment. The prefix turns every comment into a micro-reflection checkpoint.

In a shared repository with many contributors, each with their own setup, this convention works as a baseline check. If the // why: prefix lives in a shared CONTRIBUTING.md, any contributor, human or AI, is held to the same standard. You see instantly in review whether conventions are respected. If the comments are off, all else might be ignored until this first step is fixed.

Limitations and ideas

If the AI does not write any comments, well, you won't catch it going off the rails. It might also just mechanically apply the prefix and ignore all other instructions. Though the rest of the instructions live close by to this one, as long as the why comment context is not lost probably the other rules are also present.

The // why: prefix is one canary. One could also go further and require a short comment in a defined format for every new method or class. That way it would become even clearer, though you might end up with a bunch of unneeded comments. It could even be just --- line separators or any pattern... As long as it is immediately catching the eye.

Linting: Yes, you could also lint on all your rules, but the nice part of this is that it is immediately visible during active coding.

Try it

Add two lines to your CLAUDE.md / AGENTS.md / CONTRIBUTING.md:

Every comment must use a `// why:` prefix.
If you cannot articulate a "why", the comment should not exist.

Does this increase trust? No. If anything, I trust AI less now. This canary just tells me exactly when I should not trust it at all.