We scanned gstack. The score was brutal, but the useful part was the verdict.
A 3/100 score is easy to screenshot. The harder question is whether the findings are real. We used gstack as a live calibration run and changed aislop so confirmed defects, security patterns, style debt, and AI-slop indicators are no longer collapsed into one blunt verdict.
A strict quality gate has one job: make the next engineering decision easier. If it only gives you a scary number, it fails that job.
We recently ran aislop against
gstack, a public JavaScript/TypeScript browser
automation project. The raw result was harsh: 3/100,
with 978 findings across 221 files.
That number is not useless. But it is also not enough. A missing dependency is not the same kind
of evidence as an aggressive comment rule. A conservative innerHTML
warning is not the same thing as proven exploitable XSS. So we used this scan to answer the
question every team asks after a strict tool reports a low score:
What is genuinely broken, what is reviewable risk, and what is code hygiene debt?
The scan
The current scan output looked like this:
3 / 100 Critical 8 errors · 965 warnings · 5 info · 568 fixable 221 files · 5 engines Verdict mix: 866 style/policy 103 AI-slop indicators 7 conservative security 2 confirmed defects 2 high-confidence, 976 medium-confidence
That last block is the important change. Instead of asking people to infer severity from a flat list, aislop now separates the shape of the evidence.
Finding 1: an out-of-scope variable
The first confirmed defect was a normal JavaScript scope bug. A value is declared inside a
try block, then used later by supervisor code after
that block has ended.
try {
const serverEnv = {
BROWSE_HEADED: '1',
BROWSE_PORT: '34567',
BROWSE_SIDEBAR_CHAT: '1',
};
const newState = await startServer(serverEnv);
} catch (err) {
process.exit(1);
}
// later, outside the try block
const respawned = await startServer(serverEnv);This is not style. This is not taste. The variable is not in scope where it is used. That is a high-confidence defect.
Finding 2: a lazy import with no declared package
The second confirmed defect was a dependency contract issue. The code lazy-loads
sharp on the screenshot path:
export async function guardScreenshotBuffer(input: Buffer) {
const sharpModule = await import("sharp");
const sharp = sharpModule.default ?? sharpModule;
const image = sharp(input);
// ...
}But sharp is not declared in
dependencies,
devDependencies, or
optionalDependencies. Lazy imports are fine. Hidden
runtime dependencies are not. If that path runs on a machine without
sharp, it fails at runtime.
The security findings were different
aislop also reported six innerHTML errors. One
example:
const argsText = entry.args ? entry.args.join(' ') : '';
div.innerHTML = `
<div class="entry-header">
<span class="entry-command">${escapeHtml(entry.command || entry.type)}</span>
</div>
${argsText ? `<div class="entry-args">${escapeHtml(argsText)}</div>` : ''}
`;This is worth surfacing. It is also not the same as proving XSS. The snippet uses escaping for dynamic values, but it still builds DOM through a string template. That means a scanner should call it a conservative security pattern, not pretend it has completed a full dataflow proof.
That distinction matters. Teams ignore tools that overstate certainty. They trust tools that tell them what kind of review is needed.
What the low score really meant
The project still has a lot of code quality debt under aislop's rules. The top buckets were:
| Bucket | Count | How to read it |
|---|---|---|
| Style and policy | 866 | Narrative comments, empty blocks, unused variables, function size, duplicate blocks. |
| AI-slop indicators | 103 | Patterns commonly left by generated or rushed code, but still reviewable. |
| Conservative security | 7 | Security-sensitive APIs that deserve review, not automatic exploit claims. |
| Confirmed defects | 2 | High-confidence issues that should be fixed before relying on those paths. |
So the honest verdict is not "978 bugs." The honest verdict is: two proven defects, several security-sensitive patterns to review, and a large amount of hygiene debt that would make future maintenance harder.
What we improved in aislop
This was not a one-off exception for one repo. We made generic changes:
- verdict mixJSON, terminal, and MCP output now classify findings as confirmed defects, conservative security, style/policy, or AI-slop indicators.
- confidenceConfirmed defects are high confidence. Style, policy, and heuristic findings are shown as medium confidence instead of being dressed up as proof.
- runtime globalsChrome extension globals, Supabase function globals, Bun, and Deno are handled by scoped runtime context instead of broad project-wide exemptions.
- import precisionRemote and virtual specifiers such as
https:,jsr:, andnpm:are no longer treated like missing npm dependencies. - security precisionPlaceholder secrets, generated SQL placeholder lists, safe non-shell spawn arrays, and obviously escaped DOM fragments are handled more carefully.
- score pressureRepeated hits from one rule family are capped so one noisy pattern cannot dominate the entire score.
The goal was not to make the number nicer. The goal was to make the number explainable.
The cleanup order we would use
If this were our codebase, we would not start with the 866 style findings. We would sequence the work by evidence strength and blast radius:
Move
serverEnvto the outer scope used by the supervisor, or pass an explicit environment object into the respawn path.Add
sharpas a declared dependency or make the screenshot resizing path degrade explicitly when the optional package is missing.Review the six
innerHTMLsites and replace the highest-risk templates with DOM node construction where user-controlled fields enter the string.Run the mechanical fixer for the 568 fixable findings, then review the diff instead of editing every narrative comment or unused import by hand.
Tackle the remaining debt by module: empty blocks, unused variables, long functions, duplicate blocks, and type assertions. Those are maintainability issues, not release blockers by themselves.
That is the point of the new verdict. It turns a scary number into an ordered engineering plan.
The release lesson
This is the bar we want from AI-code quality tooling: strict enough to catch real failures, calibrated enough not to overclaim, and transparent enough that a maintainer can tell which findings deserve immediate action.
On gstack, aislop was right to be worried. It found a real scope bug, a real dependency contract issue, several security-sensitive DOM writes, and enough hygiene debt to justify a serious cleanup pass.
But the improved verdict is more useful than the original score. A team can now look at a 3/100 result and know where to start: fix the two confirmed defects, review the conservative security patterns, then decide how much style and AI-slop debt they want to pay down.
Try it on your repo:
npx aislop scan