Opinion May 11, 2026 · 7 min read

AGENTS.md is a sign on the wall. Agents don't read signs.

Is AGENTS.md enough to keep your coding agent honest? Some teams swear by it. Some add a CLAUDE.md on top. Some write nothing and hope. Here is what we landed on after 25 projects, and why the markdown file on its own is a sign on the wall that agents do not read.

Is AGENTS.md enough to keep your coding agent honest? Some teams swear by it. Some stack a CLAUDE.md and a .cursorrules on top. Some write nothing at all and hope the model remembers house style from the last session. After running aislop against 25 real projects, here is what we landed on. The markdown file is a nice idea. It is not a guardrail. It is a sign on the wall, and agents do not read signs.

AGENTS.md, CLAUDE.md, .cursorrules, the system field of whatever model you are calling. A team writes down its conventions, the agents that pass through the repo read the file, everyone agrees on house style, the PRs get cleaner. That is the dream. Then you look at last week's diff and the agent ignored three of the four rules. Not because the file is broken. Because AGENTS.md is a norm, not a guardrail. The agent is told not to. It still does. There is no consequence.

The failure mode

Almost every AI-assisted codebase we scanned in the 25-project run had something that looked like AGENTS.md. Markdown files at the root. Pinned style notes in the prompt. .cursorrules with explicit "do not" lists. None of those repos were free of the things their own files told the agent not to do. as any casts. 200-line functions. Swallowed exceptions. Console leftovers. Narrative JSDoc above one-line functions.

The pattern was the same every time. The agent had read the file at some point, complied for a few turns, drifted, and the team had no mechanism to catch the drift. aislop scan caught it on first run. The rules in the markdown file matched the rules the linter ran. The difference was that one ran every PR and the other ran when somebody remembered.

The four-layer model

A standard sticks when four layers are present. None of them is optional. None of them is sufficient alone.

1. WHAT. The written standard.

AGENTS.md. Describes the what and the why. Humans read it during onboarding. Agents read it when a tool puts it in their context. It is documentation. It does not enforce.

2. HOW. The machine-enforceable rules.

The what, automated. In aislop's case that is 50+ rules and checks across six engines. Format, lint, code-quality, ai-slop, security, architecture. Rule IDs like ai-slop/narrative-comment, ai-slop/swallowed-exception, ai-slop/unsafe-type-assertion, complexity/function-too-long, security/vulnerable-dependency. If a rule can be automated, it must be. The rest stays in the markdown file.

3. GATE. The quality threshold.

aislop ci. The enforcement layer. A PR that drops the score below threshold cannot merge. No human has to remember to look. The gate looks. One small workflow.

4. LOOP. The agent handoff.

aislop fix --claude, --codex, --cursor, and 11 more. When the gate flags issues the deterministic fix cannot safely resolve, the agent gets them. File paths, rule IDs, the violating code, the architectural reason behind the rule. The same agents that produced the slop, with structured input, cleaning it up.

All four together. AGENTS.md without rules is documentation nobody enforces. Rules without a gate is a linter nobody runs. A gate without a loop is a wall with no door, and the door is how you fix what the wall blocked.

Told not to vs. cannot

The four-layer model is doing one thing. Closing the gap between "the agent is told not to do X" and "the agent cannot do X without it being caught and reverted."

Told not to lives in AGENTS.md. The agent reads the rule, internalizes it for a few turns, drifts. There is no consequence to drift because nothing checks. PR ships with the rule violated. Team notices weeks later in a retro. Adds the rule back to AGENTS.md. Cycle continues.

Cannot lives in CI. The agent writes the violation. The pipeline runs aislop ci. The score drops. The PR is red. The merge button is disabled. The agent, or the human, has to fix it before anything ships. The rule is not just told. It is enforced.

Case study: this site

This marketing site uses all four layers. Worth walking through.

Layer	Tool	What it does
WHAT	`AGENTS.md`	At the repo root. Describes the voice, the routing convention (drafts get an underscore prefix so Astro does not route them), the size targets for components, the don'ts.
HOW	`aislop`	Runs against the source. The size rules catch oversized page files. The narrative-comment rule catches JSDoc the agent felt obliged to write above page components. The unused-import and dead-pattern rules catch the leftovers from agent-driven refactors.
GATE	`aislop ci`	CI runs on every PR. Score has to hold or move forward. The threshold is set just below the current score so any regression fails the build.
LOOP	`aislop fix --claude`	Handles the residuals. The size violations the deterministic fix will not auto-split, the typed-shorthand cases that need a human-shaped refactor, the orphan files knip can find but will not safely delete.

Every layer earns its place. The markdown file alone would not have stopped the agent shipping a 600-line page. The rules alone would not fail the PR. The gate alone would not fix what it blocked. The loop alone has nothing to fix because nothing flagged. All four, in sequence, is the only configuration that ships clean.

What 0.5 proved

During the 0.5 rehaul itself, every layer caught something. AGENTS.md guided the agent on commit-message style and the "own the destructive fix" principle. The 40+ rules ran self-scan against the engine source on every commit. The CI gate held the line at 100/100. Any regression would have failed the release. The handoff loop fed Claude the residuals from the narrative-comment sweep. The 70 comments mentioned in the other essays.

Pull any layer and the release would have shipped messier. The markdown file alone could not keep the agent from writing JSDoc paragraphs. The rules alone could not have made the rehaul fail. We would have shipped 70 comments and noticed two weeks later. The gate alone would not have given the agent any way to repair what it blocked. The loop alone had no input until the gate produced the failure list.

The four-layer rule

If your team's standards live only in AGENTS.md, you have a wish list. If they live in rules without a gate, you have a linter no one runs. If they live in a gate without a loop, you have a wall with no door. Write the file. Encode the rules. Commit the gate. Wire the handoff. AGENTS.md is not enough. You have to hold your agent accountable. The agent now operates inside something that will catch what it gets wrong, and the human at the end of the loop reviews what passed, not what slipped.

Wire the gate

- uses: scanaislop/aislop@v0.10.1 with: version: latest

One workflow and the agent stops being told not to. It cannot. Star the repo if you want the next release in your feed.

← All posts