Guide May 30, 2026 · 14 min read

AI Slop: How to Detect and Prevent Low-Quality AI Code

AI slop is not always broken code. It can compile, pass tests, and look professional, then fail later under conditions the model never considered. After scanning real projects and agent-written changes, here is how to detect the repeatable patterns and keep them out of main.

TL;DR.

AI slop is code that compiles, passes tests, and still degrades your codebase because it is structurally shallow, repeated at machine scale.
It is harder to catch than normal tech debt because it looks finished, spreads uniformly, and passes the checks teams trust.
Most of it falls into a small set of named patterns: swallowed errors, comments that lie, type-system escape hatches, hallucinated imports, dead code, generic names, and complexity inflation.
Deterministic rules catch the patterns you can define. LLM review catches logic that needs intent. Use deterministic rules for enforceable gates, and use LLM review where judgment is useful.
Prevention is a loop, not a cleanup: scan at the hook, gate in CI, track the trend. Run npx aislop scan to see where you stand.

What AI slop actually is

In the content world, "AI slop" means mass-produced, low-effort text generated to capture search traffic. In code, the definition is sharper: AI slop is AI-generated code that compiles, passes tests, and still carries shallow structure that can fail later.

The crucial part is that it is often not broken in the obvious way. A single trivial comment is harmless. A single empty catch block is a small risk. The problem is volume. Agent-written changes can repeat the same small shortcuts across many files. At that scale, style noise becomes comprehension drag, debugging tax, and production risk.

You know it when you read it. A pull request where everything technically works, but following the logic feels like wading through filler. That feeling is the signal. The rest of this guide turns that feeling into specific, named, detectable patterns.

Why the volume changed everything

Merriam-Webster named slop a defining word of 2025. Search interest in "AI slop" went from niche slang to a broader engineering concern inside a year. The term crossed over because the experience is becoming familiar: teams shipping with Claude Code, Cursor, Codex, or other agents are generating more code, faster, than their review process was designed to absorb.

The volume is the story. When a human wrote every line, review scaled with output because both were bounded by the same person typing. Agents break that coupling. One engineer can now open a hundred-file change in an afternoon. The code arrives faster than anyone can read it, and it arrives polished enough that skimming feels safe. It is not.

Across 25+ real projects we scanned, the same handful of patterns showed up again and again, independent of language, framework, or which agent wrote the code. The receipts are in our 25-project findings post. The patterns are remarkably consistent, which is exactly why they are detectable.

Why slop slips past the checks you already run

Traditional tech debt has a tell. A human took a shortcut, knew it was a shortcut, and usually left a scar: a TODO, a hack comment, a function everyone on the team quietly avoids. You can find it because someone chose it.

AI slop has none of those tells, for three reasons.

It looks finished. Agents write fluent, idiomatic-looking code with confident naming and tidy formatting. The surface signals your brain uses to flag "this was rushed" are exactly the signals an agent gets right. The shallowness is underneath.

It is uniform. Human debt is local and idiosyncratic. AI slop is systemic. The same swallowed-exception habit, the same generic data and result names, the same redundant try/catch appear everywhere, which makes any single instance look normal because it matches everything around it.

It passes your checks. Tests are green because the code does the happy path. The type checker is satisfied because an as any told it to be. The linter is quiet because trivial comments and shallow structure are not lint errors. Many quality gates were designed for human-paced output, and agent output changes the load profile.

This is why manual review should not be the only answer. You cannot ask reviewers to read a hundred files with the same care they gave to ten, every day. The catch should be mechanical for the patterns that can be mechanical, so reviewers spend their attention on the things only a human can judge.

The patterns, and the rules that catch them

This is where vague talk about "code smell" has to become specific. Below are the families of AI slop we see most often, each tied to the deterministic rules that flag it. Named patterns are the difference between "this feels off" and "line 42, swallowed-exception, fix it." You can browse the full set on the rules reference.

Comments that lie. // Initialize the database connection above initDB(). It says nothing the code does not, and worse, it drifts. The function gets renamed six months later, the comment does not, and now it actively misleads. Caught by trivial-comment, narrative-comment, and meta-comment. We wrote a whole piece on why this is the most under-rated slop signal: stop letting your agent write comments.

Swallowed errors. An empty catch block, or a catch whose only body is console.log(err). The failure is consumed and the caller never learns it happened. This can become a production incident when the system keeps running while quietly doing the wrong thing. Caught by swallowed-exception, silent-recovery, redundant-try-catch, and in Python by python-bare-except and python-broad-except. We told the story of one such bug in the swallowed exception that broke production.

Type-system escape hatches. as any, as unknown as T, a @ts-ignore dropped in to make an error disappear. The code satisfied the compiler by removing part of the guarantee the type system was there to provide. Caught by unsafe-type-assertion, double-type-assertion, ts-directive, and redundant-type-coercion.

Hallucinated imports. An import of a package that does not exist. Sometimes it is a harmless typo the build catches. Sometimes it is a name an attacker has already registered, which makes it a supply-chain vector, not a bug. This is the one slop pattern we treat as an error rather than a warning. Caught by hallucinated-import.

Dead code and duplication. Unreachable branches, variables assigned but never read, the same utility re-implemented in four files with four slightly different shapes and no canonical version. Agents generate locally and forget globally, so duplication is the default. Caught by unreachable-code, duplicate-import, unused-import, and the knip/* family for unused files, exports, and dependencies. More on this in your agent is leaving dead code behind.

Generic names and stubs. data, data2, result, temp, empty functions that pretend to do work, thin wrappers that forward a call and add nothing, and TODO stubs left where real logic should be. Each makes the next reader slower. Caught by generic-naming, empty-function, thin-wrapper, todo-stub, and rust-todo-stub.

Debug leftovers and hardcoded values. A stray console.log or print shipped to production, an ID or URL pasted inline instead of read from config, a secret hardcoded in source. These are useful signals that generated code needs another pass. Caught by console-leftover, python-print-debug, hardcoded-id, hardcoded-url, and the error-level security/hardcoded-secret.

Complexity inflation. Functions that grow to hundreds of lines, files that become dumping grounds, nesting four levels deep, parameter lists no caller can hold in their head. Agents do not feel the pain of a long function, so they keep adding to it. Caught by function-too-long, file-too-large, deep-nesting, and too-many-params. We cover where to set the thresholds in function size limits for AI code.

Security shortcuts. String interpolation in a SQL query, a shell command built from user input, eval on untrusted data, innerHTML with a value the model never sanitized, a dependency with a known CVE. The agent generates the most direct path to the result, which is rarely the safest one. Caught at error level by security/sql-injection, security/shell-injection, security/eval, security/innerhtml, and security/vulnerable-dependency.

Language-specific tells. Slop is not only a JavaScript problem. Python mutable default arguments, range(len(x)) loops, and isinstance ladders. Rust .unwrap() outside tests. Go library code that panics instead of returning an error. Each language has its own dialect of shortcut, and each has rules to match: python-mutable-default, python-range-len-loop, python-isinstance-ladder, rust-non-test-unwrap, and go-library-panic.

What a scan actually looks like

Abstract patterns are easier to act on once you have seen them land. A scan does not return prose, it returns a score and a list of findings, each pinned to a file, a line, and a named rule. That precision is the whole point: it is the difference between a reviewer writing "this feels noisy" and a gate saying exactly what to change.

$ npx aislop scan

src/auth/session.ts
  14:3   warning  swallowed-exception    empty catch swallows the failure
  22:9   warning  unsafe-type-assertion  as-any discards the checked type
  41:1   warning  trivial-comment        comment restates the next line

src/db/pool.ts
  8:12   error    hallucinated-import    'pg-promise-x' is not a known package

score  78/100   ·   1 error   3 warnings   ·   4 files

Every line is concrete: rule, file, line, and severity. Errors can block the gate, warnings can count against the score, and an agent reading this output knows what to fix before the next commit. That consistency is what lets you put it in front of a merge.

Three ways to detect it

There are three approaches to detection, and the right answer is not to pick one but to put each where it belongs.

LLM-based detection. Ask a model to review the diff for slop. It can catch context-dependent issues a fixed rule cannot, like a function that is technically fine but solves the wrong problem. The weakness is consistency: the same code can receive different feedback on a later run, because the verdict is sampled from a distribution. Treat it as advisory review unless your team accepts that variability.

Deterministic rule-based detection. Scan the code against a fixed set of patterns. The verdict is identical every run, it needs no network, and it is fast enough to sit in a pre-commit hook. The limit is honesty: a rule only catches what you can define precisely, so deterministic scanning will not find a subtle logic bug that depends on business intent.

Hybrid, which is what we recommend. Deterministic rules run the quality gate, because a gate has to be consistent to be fair. LLM review runs alongside as advisory, catching intent-level issues and leaving suggestions. We go deeper on this split in deterministic versus LLM review.

This is also why a general-purpose static analyzer may not be enough on its own. Tools built for broad static analysis carry thousands of rules for bugs, security, and maintainability, but they are not always tuned for the shallow patterns that show up in agent-assisted changes. We unpack that gap in where SonarQube stops, and AI-slop rules start.

The prevention workflow that holds

Detection is half the job. The other half is putting it where slop is created instead of where it is discovered. A practical workflow has three layers, and none of them depends on manual cleanup after the fact.

Layer 1: the agent hook. Run the scanner as the agent writes, not after. Findings surface in-editor, and the agent can fix them while the context is still loaded. This is usually the lowest-friction place to catch a repeatable issue. Setup is in the hooks guide.

Layer 2: the CI gate. Every pull request is scanned, and a score below your configured threshold can block the merge. The gate is not there to be perfect, it is there to be consistent, so the same standard applies to every PR whether a human or an agent wrote it. Setup is covered in the CI guide.

Layer 3: the trend. Track the score over time, per repo and per team. A single score is a snapshot; a trend shows whether the workflow is improving. The goal is to make the cleaner path the easier path.

The order matters. Catching slop at the hook is worth more than catching it in CI, and catching it in CI is worth more than catching it in review, because the cost of a fix rises at every step the problem survives. The goal is to move the catch as early as it will go.

What it costs to let slop ride

Slop does not stay the same size. Every merge that ships a little shallow code makes the next change harder to reason about, which makes the next agent prompt produce shallower output because it is matching the surrounding code, which lowers the bar again. The loop accelerates. We wrote about how it reaches production in when the AI slop loop breaks production.

The cost lands in three places. Engineers spend more time reading and less time building, because comprehension is the bottleneck and slop attacks comprehension directly. Incidents get harder to trace when swallowed errors and missing observability hide what happened. Review quality also drops when people spend their attention on repeatable cleanup instead of design and behavior.

The fix is not heroics. It is a gate that holds the line for defined patterns, so the loop is easier to control before review.

Frequently asked questions

What is AI slop?

AI slop is AI-generated code that compiles, passes tests, and looks professional but is structurally shallow. It is not broken. It is slightly worse than what a careful human would write, repeated at machine scale, so the cost shows up as comprehension drag, silent failures, and rework rather than red CI.

How is AI slop different from normal tech debt?

Normal tech debt is usually a known shortcut a human chose under time pressure. AI slop is unchosen and uniform. It arrives looking finished, it spreads across every file an agent touches, and it passes the checks teams rely on, so it stays invisible until it reaches production or until reviewers burn out trying to catch it by hand.

Can you detect AI slop automatically?

Yes, for the patterns you can define precisely. Swallowed exceptions, trivial comments, unsafe type assertions, hallucinated imports, dead code, generic naming, and oversized functions are all detectable with deterministic rules that produce the same verdict every run. Logic-level problems that depend on intent still need a reviewer or an LLM pass.

Is deterministic or LLM-based detection better?

Use both, for different jobs. Deterministic rules belong in the quality gate because they are consistent enough to enforce repeatable standards. LLM review is useful for logic-level suggestions that require understanding intent, but it is probabilistic, so avoid making it the only merge blocker.

How do I stop my AI agent from writing slop?

Put the rules where the agent works. Run a scanner as a pre-commit or agent hook so issues surface in-editor and can be fixed before they are committed, then enforce a score threshold in CI when the team is ready. The value is a shorter, clearer feedback loop.

Does an AI slop scanner replace my linter or SonarQube?

No, it complements them. Linters catch wrong code and style. General static-analysis tools may not focus on AI-specific patterns like trivial comments and swallowed exceptions. An AI slop scanner targets that third category: correct-but-shallow code repeated at scale.

The bottom line

AI slop is not a content problem that wandered into your repo. It is a code-quality problem with a specific failure profile, and it benefits from tooling built for that profile. Linters catch wrong code. LLM reviewers catch logic bugs. AI slop detection catches a third thing: code that is correct but shallow, repeated across agent-assisted changes.

A deterministic gate helps because it is consistent, because it can run before review, and because it gives the team a repeatable baseline. Run npx aislop scan on your repo to see where you stand.

← All posts