Skip to main content
New aislop v0.9.4: four new Python rules from the SlopCodeBench paper, plus a CLI star prompt and GitHub Discussions. Read more →

We scan real code, turn repeated failures into rules, and publish the method.

scanaislop should not wait for private customer data to learn what AI agents break. The research program runs aislop against open-source repositories, benchmark tasks, and agent-generated code, then turns repeatable patterns into deterministic checks. Those checks are the public evidence base for a simple question every team now faces: is the code your agents ship safe to merge?

Current thesis

The market is moving from "try the model and see" to systematic governance. As agents write more production code, the durable wedge is reproducible evidence: named patterns, pinned scans, score deltas, and rules that get stricter as the public corpus grows. Evidence is what turns a quality gate into a control layer.

Scan, prove, govern.

Research is the middle of one loop. The CLI scans on every keystroke, public research proves which patterns AI agents break, and the enterprise platform turns that evidence into policy. This page is the proof layer in the open.

01 — Scan

The deterministic gate developers run on every keystroke and every PR. No LLM at runtime, sub-second, reproducible. Same code in, same score out.

02 — Prove

Every rule traces to real code, a pinned scan, or a benchmark. The public corpus is the trust layer, and it gets stricter as the corpus grows. You are reading it now.

03 — Govern

Rule provenance, agent attribution, and policy with expiry. The same evidence becomes a control layer that proves which agents, repos, and rules put risk into the codebase.

How every research run should work.

The point is not to manufacture a flattering chart. The point is to make detector quality legible: what was scanned, what failed, what was noise, and which rule changed because of it.

  1. 01 Choose a transparent cohort: popular open-source repos, agent-generated projects, or benchmark fixtures.
  2. 02 Pin the commit SHA, aislop version, Node version, config, and enabled engines before scanning.
  3. 03 Run the same command across every repo and store the raw JSON before writing the post.
  4. 04 Sample top findings by rule, classify true positives and false positives, then patch the detector with regression tests.
  5. 05 Publish the method, the limits, and the rule changes instead of only publishing the headline number.

Own the AI-code-quality dataset.

The next phase is a repeatable public cadence: scan trending projects, scan agent outputs, publish findings, and convert the repeatable classes into new rules.

Monthly GitHub Trending scan: top public repositories by language, scored with the latest CLI.

Agent-output benchmark: same feature prompt across Claude Code, Cursor, Codex, OpenCode, Gemini, and Aider, scored deterministically.

Rule provenance pages: every AI-slop rule linked to the real code pattern, benchmark signal, or public scan that justified it.

Enterprise governance research: risk baselines and agent attribution by repo age, language, team size, and AI-agent adoption pattern.

Nominate a repository for the next scan.

Public repositories are the training ground. If a rule is noisy on real code, we fix the rule. If a pattern repeats across projects, we name it and ship a detector.

Suggest a scan →