The config file

A connection pool size of 10 is a guess. A connection pool size of 37 is a scar. Someone ran out of connections on a Tuesday afternoon, tried 50, watched latency spike, backed off to 40, still too high, landed on 37 after a week of graphs, and committed it with “tune pool size.” The code says what happens. The config says what happened.

Nobody talks about config though. Not the twelve-factor app kind, not the “should we use YAML or TOML” kind. The actual values. The numbers someone picked and committed without a PR description, three years ago, that are still running in production.

Retry count of 3, everyone starts there. Retry count of 1 with exponential backoff starting at 5 seconds – that’s someone who watched a retry storm take down a downstream. The number isn’t a policy, it’s a postmortem compressed into two fields.

Timeout of 30 seconds is confidence. Timeout of 3 seconds is a team that’s been burned. Timeout of 3 seconds with a comment that says “DO NOT CHANGE” is a team that’s been burned twice.

Then there are the defaults nobody questioned. The batch size of 100 because that’s what the quickstart used. The log level set to INFO because DEBUG was too noisy during development and nobody flipped it back. Nobody decided on these values, the quickstart did, and it’s still running in production.

Some of those defaults stop being config over time. A batch_size: 100 that started as a parameter, except now there’s a buffer allocated to exactly that size somewhere in the codebase, a query that pages by that number, a downstream that times out above it. Change the config and something breaks that has nothing to do with the config. It’s not configuration anymore, it’s a contract nobody wrote down.

Staging has max_connections: 100. Production has max_connections: 25. Nobody changed production, someone just bumped staging during a load test and forgot. The environments diverged six months ago, and the diff only shows up when something breaks in production that worked fine in staging. Config drift is quieter than code drift – no compiler, no test, nothing to catch it.

Commented-out config is its own archaeology. A Redis cluster endpoint, commented, with a note that says “rollback if latency.” Someone tried to migrate, saw numbers they didn’t like, backed off, and left the evidence. That’s not dead config. That’s a failed experiment someone wanted to retry but never did. Six months later the team has turned over and nobody knows what it means, but nobody deletes it either, because what if.

I’ve spent more time reading config files than I’d like to admit. When you’re reviewing architecture or inheriting a system, the config is where you start, not the code. The code is what someone intended. The config is what survived production.