Engineering May 30, 2026 · 5 min read

What I fixed after that score, and what I kept

Part 2 of 2. The lazy fix for a noisy rule is to turn it off. I did not do that.

Part one ended with a maintainer's clean library scoring 1 out of 100, and me realizing the score was my bug, not his code. This is the week I spent fixing it, and the one decision I am most glad I did not get wrong.

Precision, not suppression

When you find a noisy rule, the lazy fix is to turn it off. I did not want to do that, because a rule that is off catches nothing. Every change I made instead tightens the rule so it still fires on the real pattern but stops flagging the convention around it.

I kept his library open as a benchmark and re-scanned it on every single change. No guessing, no "this feels better." Just one question each time: did the noise drop while the real findings stayed?

The rule-by-rule pass

A few of the changes, so you can see the shape of it:

wrappersnow only flag a true passthrough, a function forwarding its own arguments unchanged. A call that transforms its arguments, like bool(message.animation), is real work, not a wrapper.
commentsa long, well-written explanatory comment is no longer slop just because it is long. The rule now needs an actual narration signal, like "This function does X" or "First it, then it, finally it", before it fires.
catchesa catch that logs the error is observable, intentional recovery. People write that on purpose. It is only flagged now when the error is dropped on the floor.
importsimport-shaped text inside docstrings and example blocks is no longer read as a real import, and a handful of standard-library modules I was simply missing are now recognized.

On his library the findings fell from 426 to 92. I checked it both ways. The ones I removed were false positives. The ones I kept were real.

The fix I built, liked, and threw away

Even at 92 findings the library still scored low, because the rest were real: large modules, and a lot of descriptive comments. So I built a scoring change that judged a codebase by its findings per file instead of its total. A big clean repo stops getting punished for being big, and his library jumped into the 80s.

It felt great for about an hour. Then I saw what it actually was. A way to make the number look generous without removing a single real finding. The library genuinely had 92 things my rules flag, and the new math did not fix them. It hid them behind a friendlier denominator. That is the exact move I had refused a week earlier, wearing a different hat. So I reverted it.

What I kept strict

I did make one small, honest change. Style rules, like comment density and file size, now count for half in the score, so the number is driven by genuine slop rather than house style. But I kept the tool opinionated. A great library can still have debt by my standards, and I would rather say that plainly than tune the math until the score looks kind.

The whole point of a quality gate is that it tells you the truth. The day it starts being generous to be liked is the day it stops being worth running.

Then I checked it everywhere

One library is not proof. So I ran the new build across a basket of well-known projects in different languages, and it found more of my own false positives, like imports parsed out of docstrings and stdlib modules I had missed. I fixed each one with a regression test so it cannot come back. The tool got more accurate, not quieter. The real findings, the unused variables, the leftover debug prints, the swallowed errors, the vulnerable dependencies, all still fire.

All of this shipped in v0.10.0. It started with one person being generous enough to tell me my tool was wrong, and it ended with a tool that is sharper and still honest.

If you run it and think I drew a line in the wrong place, I want to hear that too. Try it with npx aislop scan, then tell me where I got it wrong on GitHub Discussions or at x.com/heavykenny. I read every one.

← All posts