Release May 28, 2026 · 4 min read · reads

aislop v0.9.4. SlopCodeBench called it verbosity. We turned it into rules.

A March 2026 paper measured how coding agents degrade over long-horizon iterative tasks. It defined verbosity as redundant or duplicated code and tracked it as a quality signal. We read the paper, picked out the four most repeatable Python patterns it surfaces, and shipped them as rules.

aislop has always been an empirical project. Most rules come from a real codebase where an agent shipped something nobody human would have written. v0.9.3 was the receipts release — 70 open-source repos, 38% noise reduction, no rule disabled.

v0.9.4 is different. This week we read a paper.

SlopCodeBench

SlopCodeBench (SCBench for short, arXiv 2603.24755, March 2026) measures something most coding-agent benchmarks ignore: what happens when an agent has to extend its own prior code repeatedly under changing specifications. The benchmark has 20 problems and 93 checkpoints. The test suite is hidden. Only observable behavior at a CLI or API boundary is specified. Internal structure is left to the agent.

Two trajectory-level quality signals are tracked: verbosity (the fraction of redundant or duplicated code) and structural erosion (the share of complexity mass concentrated in high-complexity functions). The findings are sobering: agents pass individual test cases at high rates, but strict solve rates that include regression collapse to 0.5% by the final checkpoint.

Translation: agents can solve a problem at checkpoint one, but by checkpoint nine they've buried it in plumbing nobody asked for. That plumbing has shapes. We can match those shapes deterministically. So we did.

Four new Python rules

Each rule has paired tests covering the pattern and the legitimate alternative (so the negative case stays protected against regression). 35 new tests landed in the Python suite; the full test count is now 842, all passing.

ai-slop/python-range-len-loop info

Flags for i in range(len(items)). The Pythonic alternative is enumerate(items) or direct iteration. SCBench surfaces this as a recurring agent shortcut: the model writes a C-style index loop where the language has a one-token primitive.

# flagged
for i in range(len(users)):
out.append(users[i].name)

# clean
for user in users:
out.append(user.name)

ai-slop/python-chained-dict-get warning

Flags .get(..., {}).get(...) chains. The empty-dict fallback hides missing-data cases and turns brittle as schemas evolve. Help text points to boundary normalization or a typed object.

# flagged
name = payload.get("user", {}).get("profile", {}).get("name", "")

# clean — normalize at the boundary
user = User.parse(payload)
name = user.profile.name

ai-slop/python-repetitive-dispatch warning

Flags four or more if x == "..." / elif x == "..." branches on the same selector. The clean version is a dispatch table or handler map.

# flagged
if event == "created":
    handle_create()
elif event == "updated":
    handle_update()
elif event == "deleted":
    handle_delete()
elif event == "archived":
    handle_archive()

# clean — dispatch table
HANDLERS = {
    "created": handle_create,
    "updated": handle_update,
    "deleted": handle_delete,
    "archived": handle_archive,

}

HANDLERSevent

ai-slop/python-isinstance-ladder warning

Flags four or more chained isinstance(...) branches on the same value. Recommends a handler map or normalized representation. Same shape as the dispatch ladder; same fix.

Why we cared about this paper

Most coding-agent benchmarks score one-shot completion: can the agent solve the problem from a fresh prompt? That measures the wrong thing. Real engineering is a sequence: you write code, then you change it, then you change it again, and the question is whether the codebase stays maintainable along the way.

SCBench is the first benchmark we've seen that takes trajectory seriously. It puts numbers on the thing we keep saying: agents make you 10x faster and 10x messier at the same time, and the cost shows up downstream.

Verbosity, in particular, is exactly the slop class aislop was built for. The four rules in this release map cleanly to the patterns the paper's Python track surfaces. If you've been wondering whether the rules in this CLI are arbitrary or empirical: now they're both.

Also in this release

A one-line star prompt at the end of scan output. Running npx aislop scan now ends with ★ Found this useful? Star us at github.com/scanaislop/aislop. Muted styling, single line. Suppressed in JSON output, in aislop ci, and for any hook caller that passes printBrand: false.
GitHub Discussions is open. Two structured templates ready to use: [FP] for false-positive reports (rule name, snippet, reasoning, version) and [Rule] for rule requests (pattern, what should pass, suggested name, language). Most rules in this repo came from community comments; the Discussions surface makes that easier.
README leads with the verb. The headline now reads "Catch the slop AI coding agents leave in your code" instead of the previous category-language version. The lead names the agents (Claude Code, Cursor, Codex, OpenCode) and the patterns they leave behind.

What's next

SCBench tracks two signals. We shipped rules for the first one (verbosity). The second one (structural erosion — complexity mass concentrated in high-complexity functions) is on the next release. The shape is clear: function-too-large, file-too-large, deep-nesting, and a complexity-density metric to catch the case where one function does ten things while its siblings do nothing.

Run the new release:

npx aislop@latest scan

If you find a false positive or want a rule we haven't shipped yet, the Discussions are the right place. Most of what's in this release was suggested by someone there or on Reddit.

Star the AI Slop CLI on GitHub if you want the next release in your feed.

← All posts