Skip to main content
New aislop v0.10.1: better real-code accuracy. Fixed Python complexity detection and cut TypeScript false positives. Read the story →
← Blog
Guide · 14 min read · reads

AI Slop: How to Detect and Prevent Low-Quality AI Code

AI slop is not broken code. It compiles, passes tests, and looks professional, then fails in production under conditions the model never considered. After scanning tens of thousands of agent-written files across 25+ real projects, here is how to detect it and how to keep it out of main.

TL;DR.

  • AI slop is code that compiles, passes tests, and still degrades your codebase because it is structurally shallow, repeated at machine scale.
  • It is harder to catch than normal tech debt because it looks finished, spreads uniformly, and passes the checks teams trust.
  • Most of it falls into a small set of named patterns: swallowed errors, comments that lie, type-system escape hatches, hallucinated imports, dead code, generic names, and complexity inflation.
  • Deterministic rules catch the patterns you can define. LLM review catches logic that needs intent. Use deterministic rules for the gate, because you cannot block a merge on a probability.
  • Prevention is a loop, not a cleanup: scan at the hook, gate in CI, track the trend. Run npx aislop scan to see where you stand.

What AI slop actually is

In the content world, "AI slop" means mass-produced, low-effort text generated to capture search traffic. In code, the definition is sharper and more dangerous: AI slop is AI-generated code that compiles, passes tests, and still fails in production because it was structurally shallow.

The crucial part is that it is not broken. A single trivial comment is harmless. A single empty catch block is a small risk. The problem is volume. An agent does not produce one trivial comment, it produces them in every file it touches. It does not swallow one exception, it adopts swallowing as a habit. Ten thousand small degradations across a codebase stop being style and become a comprehension crisis, a debugging tax, and a production risk profile.

You know it when you read it. A pull request where everything technically works, but following the logic feels like wading through filler. That feeling is the signal. The rest of this guide turns that feeling into specific, named, detectable patterns.

The numbers behind the problem

Merriam-Webster named slop a defining word of 2025. Search interest in "AI slop" went from niche slang to tens of thousands of monthly searches inside a year. The term crossed over because the experience is now universal: anyone shipping with Claude Code, Cursor, Codex, or any other agent is generating more code, faster, than any review process was designed to absorb.

The volume is the story. When a human wrote every line, review scaled with output because both were bounded by the same person typing. Agents break that coupling. One engineer can now open a hundred-file change in an afternoon. The code arrives faster than anyone can read it, and it arrives polished enough that skimming feels safe. It is not.

Across 25+ real projects we scanned, the same handful of patterns showed up again and again, independent of language, framework, or which agent wrote the code. The receipts are in our 25-project findings post. The patterns are remarkably consistent, which is exactly why they are detectable.

Why AI slop is harder to catch than normal tech debt

Traditional tech debt has a tell. A human took a shortcut, knew it was a shortcut, and usually left a scar: a TODO, a hack comment, a function everyone on the team quietly avoids. You can find it because someone chose it.

AI slop has none of those tells, for three reasons.

It looks finished. Agents write fluent, idiomatic-looking code with confident naming and tidy formatting. The surface signals your brain uses to flag "this was rushed" are exactly the signals an agent gets right. The shallowness is underneath.

It is uniform. Human debt is local and idiosyncratic. AI slop is systemic. The same swallowed-exception habit, the same generic data and result names, the same redundant try/catch appear everywhere, which makes any single instance look normal because it matches everything around it.

It passes your checks. Tests are green because the code does the happy path. The type checker is satisfied because an as any told it to be. The linter is quiet because trivial comments and shallow structure are not lint errors. Every gate a team relies on was built for a world where a human wrote the code, and that world is gone.

This is why manual review does not scale as the answer. You cannot ask reviewers to read a hundred files with the same care they gave to ten, every day, forever. The catch has to be mechanical for the patterns that can be mechanical, so reviewers spend their attention on the things only a human can judge.

The patterns, and the rules that catch them

This is where vague talk about "code smell" has to become specific. Below are the families of AI slop we see most often, each tied to the deterministic rules that flag it. Named patterns are the difference between "this feels off" and "line 42, swallowed-exception, fix it." You can browse the full set on the rules reference.

Comments that lie. // Initialize the database connection above initDB(). It says nothing the code does not, and worse, it drifts. The function gets renamed six months later, the comment does not, and now it actively misleads. Caught by trivial-comment, narrative-comment, and meta-comment. We wrote a whole piece on why this is the most under-rated slop signal: stop letting your agent write comments.

Swallowed errors. An empty catch block, or a catch whose only body is console.log(err). The failure is consumed and the caller never learns it happened. This is the pattern most likely to turn into a 2am incident, because the system keeps running while quietly doing the wrong thing. Caught by swallowed-exception, silent-recovery, redundant-try-catch, and in Python by python-bare-except and python-broad-except. We told the story of one such bug in the swallowed exception that broke production.

Type-system escape hatches. as any, as unknown as T, a @ts-ignore dropped in to make an error disappear. The agent reached for the fastest way to satisfy the compiler, which throws away the guarantee the type system was there to provide. Caught by unsafe-type-assertion, double-type-assertion, ts-directive, and redundant-type-coercion.

Hallucinated imports. An import of a package that does not exist. Sometimes it is a harmless typo the build catches. Sometimes it is a name an attacker has already registered, which makes it a supply-chain vector, not a bug. This is the one slop pattern we treat as an error rather than a warning. Caught by hallucinated-import.

Dead code and duplication. Unreachable branches, variables assigned but never read, the same utility re-implemented in four files with four slightly different shapes and no canonical version. Agents generate locally and forget globally, so duplication is the default. Caught by unreachable-code, duplicate-import, unused-import, and the knip/* family for unused files, exports, and dependencies. More on this in your agent is leaving dead code behind.

Generic names and stubs. data, data2, result, temp, empty functions that pretend to do work, thin wrappers that forward a call and add nothing, and TODO stubs left where real logic should be. Each makes the next reader slower. Caught by generic-naming, empty-function, thin-wrapper, todo-stub, and rust-todo-stub.

Debug leftovers and hardcoded values. A stray console.log or print shipped to production, an ID or URL pasted inline instead of read from config, a secret hardcoded in source. The first is the single most reliable signal of unreviewed agent code. Caught by console-leftover, python-print-debug, hardcoded-id, hardcoded-url, and the error-level security/hardcoded-secret.

Complexity inflation. Functions that grow to hundreds of lines, files that become dumping grounds, nesting four levels deep, parameter lists no caller can hold in their head. Agents do not feel the pain of a long function, so they keep adding to it. Caught by function-too-long, file-too-large, deep-nesting, and too-many-params. We cover where to set the thresholds in function size limits for AI code.

Security shortcuts. String interpolation in a SQL query, a shell command built from user input, eval on untrusted data, innerHTML with a value the model never sanitized, a dependency with a known CVE. The agent generates the most direct path to the result, which is rarely the safest one. Caught at error level by security/sql-injection, security/shell-injection, security/eval, security/innerhtml, and security/vulnerable-dependency.

Language-specific tells. Slop is not only a JavaScript problem. Python mutable default arguments, range(len(x)) loops, and isinstance ladders. Rust .unwrap() outside tests. Go library code that panics instead of returning an error. Each language has its own dialect of shortcut, and each has rules to match: python-mutable-default, python-range-len-loop, python-isinstance-ladder, rust-non-test-unwrap, and go-library-panic.

What a scan actually looks like

Abstract patterns are easier to act on once you have seen them land. A scan does not return prose, it returns a score and a list of findings, each pinned to a file, a line, and a named rule. That precision is the whole point: it is the difference between a reviewer writing "this feels noisy" and a gate saying exactly what to change.

$ npx aislop scan

src/auth/session.ts
  14:3   warning  swallowed-exception    empty catch swallows the failure
  22:9   warning  unsafe-type-assertion  as-any discards the checked type
  41:1   warning  trivial-comment        comment restates the next line

src/db/pool.ts
  8:12   error    hallucinated-import    'pg-promise-x' is not a known package

score  78/100   ·   1 error   3 warnings   ·   4 files

Every line is a fact, not an opinion. The error blocks the gate, the warnings count against the score, and an agent reading this output knows precisely what to fix before the next commit. No two runs disagree. That consistency is what lets you put it in front of a merge.

Three ways to detect it

There are three approaches to detection, and the right answer is not to pick one but to put each where it belongs.

LLM-based detection. Ask a model to review the diff for slop. It can catch context-dependent issues a fixed rule cannot, like a function that is technically fine but solves the wrong problem. The weakness is consistency: the same code can pass one run and fail the next, because the verdict is sampled from a distribution. You cannot block a merge on a result that changes when you click re-run.

Deterministic rule-based detection. Scan the code against a fixed set of patterns. The verdict is identical every run, it needs no network, and it is fast enough to sit in a pre-commit hook. The limit is honesty: a rule only catches what you can define precisely, so deterministic scanning will not find a subtle logic bug that depends on business intent.

Hybrid, which is what we recommend. Deterministic rules run the quality gate, because a gate has to be consistent to be fair. LLM review runs alongside as advisory, catching the intent-level issues and leaving suggestions, never blocking on a probability. We go deeper on this split in deterministic versus LLM review.

This is also why a general-purpose static analyzer is not enough on its own. Tools built for human-written code carry thousands of rules for problems humans create, and almost none for the patterns that only appear when an agent writes thousands of files. We unpack that gap in why SonarQube misses AI slop.

The prevention workflow that holds

Detection is half the job. The other half is putting it where slop is created instead of where it is discovered. The teams who actually keep main clean run the same three layers, and none of them is manual cleanup after the fact.

Layer 1: the agent hook. Run the scanner as the agent writes, not after. Findings surface in-editor, the agent fixes them on the same turn, and the slop never reaches a commit. This is the cheapest possible place to catch a problem, because the context is still loaded and nobody has reviewed anything yet. Setup is in the hooks guide.

Layer 2: the CI gate. Every pull request is scanned, and a score below your threshold blocks the merge. No silent override, no "we will clean it up later." The gate is not there to be perfect, it is there to be consistent, so the same standard applies to every PR whether a human or an agent wrote it. It takes about two minutes to wire up, covered in the CI guide.

Layer 3: the trend. Track the score over time, per repo and per team. A single score is a snapshot, a trend is a habit. Teams that enforce a threshold watch their scores converge upward within a few sprints, because the loop is tight enough that writing clean code becomes the path of least resistance instead of an act of discipline.

The order matters. Catching slop at the hook is worth more than catching it in CI, and catching it in CI is worth more than catching it in review, because the cost of a fix rises at every step the problem survives. The goal is to move the catch as early as it will go.

The compounding cost of doing nothing

Slop does not stay the same size. Every merge that ships a little shallow code makes the next change harder to reason about, which makes the next agent prompt produce shallower output because it is matching the surrounding code, which lowers the bar again. The loop accelerates. We wrote about how it reaches production in when the AI slop loop breaks production.

The cost lands in three places. Engineers spend more time reading and less time building, because comprehension is the bottleneck and slop attacks comprehension directly. Incidents get stranger and harder to trace, because swallowed errors and missing observability mean failures happen in the dark. And the best people leave, because nobody who can build well wants to spend their days untangling output nobody read.

The fix is not heroics. It is a gate that holds the line automatically, so the loop never gets to accelerate in the first place. You do not clean up slop at scale by reading harder. You stop it at the door.

Frequently asked questions

What is AI slop?

AI slop is AI-generated code that compiles, passes tests, and looks professional but is structurally shallow. It is not broken. It is slightly worse than what a careful human would write, repeated at machine scale, so the cost shows up as comprehension drag, silent failures, and rework rather than red CI.

How is AI slop different from normal tech debt?

Normal tech debt is usually a known shortcut a human chose under time pressure. AI slop is unchosen and uniform. It arrives looking finished, it spreads across every file an agent touches, and it passes the checks teams rely on, so it stays invisible until it reaches production or until reviewers burn out trying to catch it by hand.

Can you detect AI slop automatically?

Yes, for the patterns you can define precisely. Swallowed exceptions, trivial comments, unsafe type assertions, hallucinated imports, dead code, generic naming, and oversized functions are all detectable with deterministic rules that produce the same verdict every run. Logic-level problems that depend on intent still need a reviewer or an LLM pass.

Is deterministic or LLM-based detection better?

Use both, for different jobs. Deterministic rules belong in the quality gate because they are consistent and can block a merge. LLM review is useful for logic-level suggestions that require understanding intent, but it is probabilistic, so the same code can pass one run and fail the next. You cannot gate CI on a coin flip.

How do I stop my AI agent from writing slop?

Put the rules where the agent works. Run a scanner as a pre-commit or agent hook so issues surface in-editor and get fixed before they are committed, then enforce a score threshold in CI so anything below the bar cannot merge. Teams that enforce a gate see scores climb within a few sprints because the feedback loop is immediate.

Does an AI slop scanner replace my linter or SonarQube?

No, it complements them. Linters catch wrong code and style. Tools built for human code miss AI-specific patterns like trivial comments and swallowed exceptions because nobody wrote thousands of them by hand before agents existed. An AI slop scanner targets that third category: correct-but-shallow code repeated at scale.

The bottom line

AI slop is not a content problem that wandered into your repo. It is a code-quality problem with a specific failure profile, and it needs tooling built for that profile. Linters catch wrong code. LLM reviewers catch logic bugs. AI slop detection catches the third thing: code that is correct but shallow, repeated at the scale only an agent can produce.

The teams that solve it run a deterministic gate in CI. Not because it is perfect, but because it is consistent, because it blocks before review instead of commenting after, and because it breaks the loop before the loop can accelerate. Run npx aislop scan on your repo. Ten seconds, and you will see exactly where you stand.