Research

We scan real code, turn repeated failures into rules, and publish the method.

scanaislop should not wait for private customer data to learn what AI agents break. The research program runs aislop against open-source repositories, benchmark tasks, and agent-generated code, then turns repeatable patterns into deterministic checks. Those checks are the public evidence base for a simple question every team now faces: is the code your agents ship safe to merge?

Current thesis

The market is moving from "try the model and see" to systematic governance. As agents write more production code, the durable wedge is reproducible evidence: named patterns, pinned scans, score deltas, and rules that get stricter as the public corpus grows. Evidence is what turns a quality gate into a control layer.

The loop

Scan, prove, govern.

Research is the middle of one loop. The CLI scans on every keystroke, public research proves which patterns AI agents break, and the enterprise platform turns that evidence into policy. This page is the proof layer in the open.

01 — Scan

The deterministic gate developers run on every keystroke and every PR. No LLM at runtime, sub-second, reproducible. Same code in, same score out.

02 — Prove

Every rule traces to real code, a pinned scan, or a benchmark. The public corpus is the trust layer, and it gets stricter as the corpus grows. You are reading it now.

03 — Govern

Rule provenance, agent attribution, and policy with expiry. The same evidence becomes a control layer that proves which agents, repos, and rules put risk into the codebase.

Published work

Receipts before claims.

These are not generic thought pieces. Each report ties a public code sample, benchmark signal, or rule-tuning run to a concrete detector change.

70 open-source repos

Rule precision across seven ecosystems

A controlled scan of popular TypeScript, Python, Go, Rust, Ruby, PHP, and Java projects. The result: 38% fewer noisy findings, no disabled rules, and regression tests for the language conventions we learned.

Read report →

4 benchmark-derived rules

SlopCodeBench to shipped detectors

SCBench measured verbosity and structural erosion in long-horizon coding tasks. We converted the repeatable Python verbosity signals into deterministic rules with positive and negative tests.

Read report →

25 early public scans

First-run AI-slop findings

The first public scan batch showed the same patterns across unrelated projects: narrative comments, unsafe type escapes, swallowed exceptions, unused code, and fixer assumptions that break pipelines.

Read report →

Protocol

How every research run should work.

The point is not to manufacture a flattering chart. The point is to make detector quality legible: what was scanned, what failed, what was noise, and which rule changed because of it.

01 Choose a transparent cohort: popular open-source repos, agent-generated projects, or benchmark fixtures.
02 Pin the commit SHA, aislop version, Node version, config, and enabled engines before scanning.
03 Run the same command across every repo and store the raw JSON before writing the post.
04 Sample top findings by rule, classify true positives and false positives, then patch the detector with regression tests.
05 Publish the method, the limits, and the rule changes instead of only publishing the headline number.

Next runs

Own the AI-code-quality dataset.

The next phase is a repeatable public cadence: scan trending projects, scan agent outputs, publish findings, and convert the repeatable classes into new rules.

Monthly GitHub Trending scan: top public repositories by language, scored with the latest CLI.

Agent-output benchmark: same feature prompt across Claude Code, Cursor, Codex, OpenCode, Gemini, and Aider, scored deterministically.

Rule provenance pages: every AI-slop rule linked to the real code pattern, benchmark signal, or public scan that justified it.

Enterprise governance research: risk baselines and agent attribution by repo age, language, team size, and AI-agent adoption pattern.

Nominate a repository for the next scan.

Public repositories are the training ground. If a rule is noisy on real code, we fix the rule. If a pattern repeats across projects, we name it and ship a detector.

Suggest a scan →