A traditional path to getting lingering duplicate systems
Mar. 21st, 2026 08:36 pmIn yesterday's entry I described a lingering duplicate system and how it had taken us a long time to get rid of it, but I got too distracted by the story to write down the general thoughts I had on how this sort of thing happens and keeps happening (also, the story turned out to be longer than I expected). We've had other long running duplicate systems, and often they have more or less the same story as yesterday's disk space usage tracking system.
The first system built is a basic system. It's not a bad system, but it's limited and you know it. You can only afford to gather disk usage information once a day and you have nowhere to put it other than in the filesystem, which makes it easy to find and independent of anything else but also stops it updating when the filesystem fills up. Over time you may improve this system (cheaper updates that happen more often, a limited amount of high resolution information), but the fundamental issues with it stick around.
After a while it becomes possible to build a different, better system (you gather disk usage information every few minutes and put it in your new metrics system), or maybe you just realize how to do a better version from scratch. But often the initial version of this new system has its own limitations or works a bit differently or both, or you've only implemented part of what you'd need for a full replacement of the first system. And maybe you're not sure it will fully work, that it's really the right answer, or if you'll be able to support it over the long term (perhaps the cardinality of the metrics will be too overwhelming).
(You may also be wary of falling victim to the "second system effect", since you know you're building a second system.)
Usually this means that you don't want to go through the effort and risk of immediately replacing the old system with the new system (if it's even immediately possible without more work on the new system). So you use the new system for new stuff (providing dashboards of disk space usage) and keep the old system for the old stuff (the officially supported commands that people know). The old system is working so it's easier to have it stay "for now". Even if you replace part of the use of the old system with the new system, you don't replace all of it.
(If your second system started out as only a partial version of the old system, you may also not be pushed to evolve it so that it could fully replace the old system, or that may only happen slowly. In some ways this is a good thing; you're getting practical experience with the basic version of the new system rather than immediately trying to build the full version. This is a reasonable way to avoid the "second system effect", and may lead you to find out that in the new system you want things to operate differently than the old one.)
Since both the old system and the new system are working, you now generally have little motivation to do more work to get rid of the old system. Until you run into clear limitations of the old system, moving back to only having one system is (usually) cleanup work, not a priority. If you wanted to let the new system run for a while to prove itself, it's also easy to simply lose track of this as a piece of future work; you won't necessarily put it on a calendar, and it's something that might be months or a year out even in the best of circumstances.
(The times when the cleanup is a potential priority are when the old system is using resources that you want back, including money for hardware or cloud stuff, or when the old system requires ongoing work.)
A contributing factor is that you may not be sure about what specific behaviors and bits of the old system other things are depending on. Some of these will be actual designed features that you can perhaps recover from documentation, but others may be things that simply grew that way and became accidentally load bearing. Figuring these out may take careful reverse engineering of how the system works and what things are doing with it, which takes work, and when the old system is working it's easier to leave it there.

