Engineering June 3, 2026 · 5 min read

We scanned gstack. The score was brutal, but the useful part was the verdict.

A 3/100 score is easy to screenshot. The harder question is whether the findings are real. We used gstack as a live calibration run and changed aislop so confirmed defects, security patterns, style debt, and AI-slop indicators are no longer collapsed into one blunt verdict.

A strict quality gate has one job: make the next engineering decision easier. If it only gives you a scary number, it fails that job.

We recently ran aislop against gstack, a public JavaScript/TypeScript browser automation project. The raw result was harsh: 3/100, with 978 findings across 221 files.

That number is not useless. But it is also not enough. A missing dependency is not the same kind of evidence as an aggressive comment rule. A conservative innerHTML warning is not the same thing as proven exploitable XSS. So we used this scan to answer the question every team asks after a strict tool reports a low score:

What is genuinely broken, what is reviewable risk, and what is code hygiene debt?

The scan

The current scan output looked like this:

3 / 100   Critical      8 errors  ·  965 warnings  ·  5 info  ·  568 fixable
221 files  ·  5 engines

Verdict mix:
866 style/policy
103 AI-slop indicators
7 conservative security
2 confirmed defects
2 high-confidence, 976 medium-confidence

That last block is the important change. Instead of asking people to infer severity from a flat list, aislop now separates the shape of the evidence.

Finding 1: an out-of-scope variable

The first confirmed defect was a normal JavaScript scope bug. A value is declared inside a try block, then used later by supervisor code after that block has ended.

try {
const serverEnv = {
  BROWSE_HEADED: '1',
  BROWSE_PORT: '34567',
  BROWSE_SIDEBAR_CHAT: '1',
};

const newState = await startServer(serverEnv);
} catch (err) {
process.exit(1);
}

// later, outside the try block
const respawned = await startServer(serverEnv);

This is not style. This is not taste. The variable is not in scope where it is used. That is a high-confidence defect.

Finding 2: a lazy import with no declared package

The second confirmed defect was a dependency contract issue. The code lazy-loads sharp on the screenshot path:

export async function guardScreenshotBuffer(input: Buffer) {
const sharpModule = await import("sharp");
const sharp = sharpModule.default ?? sharpModule;
const image = sharp(input);
// ...
}

But sharp is not declared in dependencies, devDependencies, or optionalDependencies. Lazy imports are fine. Hidden runtime dependencies are not. If that path runs on a machine without sharp, it fails at runtime.

The security findings were different

aislop also reported six innerHTML errors. One example:

const argsText = entry.args ? entry.args.join(' ') : '';

div.innerHTML = `
<div class="entry-header">
  <span class="entry-command">${escapeHtml(entry.command || entry.type)}</span>
</div>
${argsText ? `<div class="entry-args">${escapeHtml(argsText)}</div>` : ''}
`;

This is worth surfacing. It is also not the same as proving XSS. The snippet uses escaping for dynamic values, but it still builds DOM through a string template. That means a scanner should call it a conservative security pattern, not pretend it has completed a full dataflow proof.

That distinction matters. Teams ignore tools that overstate certainty. They trust tools that tell them what kind of review is needed.

What the low score really meant

The project still has a lot of code quality debt under aislop's rules. The top buckets were:

Bucket	Count	How to read it
Style and policy	866	Narrative comments, empty blocks, unused variables, function size, duplicate blocks.
AI-slop indicators	103	Patterns commonly left by generated or rushed code, but still reviewable.
Conservative security	7	Security-sensitive APIs that deserve review, not automatic exploit claims.
Confirmed defects	2	High-confidence issues that should be fixed before relying on those paths.

So the honest verdict is not "978 bugs." The honest verdict is: two proven defects, several security-sensitive patterns to review, and a large amount of hygiene debt that would make future maintenance harder.

What we improved in aislop

This was not a one-off exception for one repo. We made generic changes:

verdict mixJSON, terminal, and MCP output now classify findings as confirmed defects, conservative security, style/policy, or AI-slop indicators.
confidenceConfirmed defects are high confidence. Style, policy, and heuristic findings are shown as medium confidence instead of being dressed up as proof.
runtime globalsChrome extension globals, Supabase function globals, Bun, and Deno are handled by scoped runtime context instead of broad project-wide exemptions.
import precisionRemote and virtual specifiers such as https:, jsr:, and npm: are no longer treated like missing npm dependencies.
security precisionPlaceholder secrets, generated SQL placeholder lists, safe non-shell spawn arrays, and obviously escaped DOM fragments are handled more carefully.
score pressureRepeated hits from one rule family are capped so one noisy pattern cannot dominate the entire score.

The goal was not to make the number nicer. The goal was to make the number explainable.

The cleanup order we would use

If this were our codebase, we would not start with the 866 style findings. We would sequence the work by evidence strength and blast radius:

Move serverEnv to the outer scope used by the supervisor, or pass an explicit environment object into the respawn path.
Add sharp as a declared dependency or make the screenshot resizing path degrade explicitly when the optional package is missing.
Review the six innerHTML sites and replace the highest-risk templates with DOM node construction where user-controlled fields enter the string.
Run the mechanical fixer for the 568 fixable findings, then review the diff instead of editing every narrative comment or unused import by hand.
Tackle the remaining debt by module: empty blocks, unused variables, long functions, duplicate blocks, and type assertions. Those are maintainability issues, not release blockers by themselves.

That is the point of the new verdict. It turns a scary number into an ordered engineering plan.

The release lesson

This is the bar we want from AI-code quality tooling: strict enough to catch real failures, calibrated enough not to overclaim, and transparent enough that a maintainer can tell which findings deserve immediate action.

On gstack, aislop was right to be worried. It found a real scope bug, a real dependency contract issue, several security-sensitive DOM writes, and enough hygiene debt to justify a serious cleanup pass.

But the improved verdict is more useful than the original score. A team can now look at a 3/100 result and know where to start: fix the two confirmed defects, review the conservative security patterns, then decide how much style and AI-slop debt they want to pay down.

Try it on your repo: npx aislop scan

← All posts