Modern spaghetti

Modern spaghetti

We are still fairly bad at writing large pieces of software. The style of this kind of software hasn't really changed over the 50 years or so that we've been writing it: as it grows and more and more people work on it, the structure slowly goes fuzzy around the edges and eventually collapses entirely into a pile of tangled spaghetti. We constantly try to invent new techniques to work around this problem and it seems almost inevitable that each one becomes the source of the problems.

Spaghetti code is code that has become hard to reason about because of how it interrelates with itself. Dijkstra came up with an interesting definition for such code in his infamous letter about gotos: code whose state is difficult to describe other than by running the program with a particular input for a set number of steps. For a piece of software to have a way to specify its possible states like this, it implies it has another quality: each of these parts can be split out and executed independently of one another.

For large pieces of software, these are both important qualities. Being able to accurately describe states makes it easier to narrow down the set of suspect code when a bug is found, and quickly reproduce it without having to follow every action the user took. Being able to split the code into individual components makes it easy to reconfigure the software to perform new tasks or existing tasks in a new way, or to replace them when they are no longer suitable for their job. One of the things I've found with working with large software projects is that even now, very few of them truly have these qualities.

We can feel this most profoundly by looking at our tests, or perhaps more accurately at the tests that have not been written. Big pieces of software inevitably seem to have certain classes that require creating every other class in order to instantiate them, or which need a whole database instance to work, or a huge pile of mocks. These classes are seldom well-tested if they are at all because the tests take a long time to write and perform poorly. More importantly, the huge amount of connectivity they have to the rest of the software means that it's nearly impossible to sensibly test the entire gamut of their performance.

These classes are modern spaghetti code. In modern code, we have usually got state under control but recreated an age old problem by making every single piece of code dependent on every other piece of code in some way: sometimes explicitly and sometime subtly by assuming the behaviour of the rest of the application. It's often still impossible to properly break down our software into components so we can reason about it and verify it: an inherent problem caused by how we compose smaller programs into bigger ones.

The more dependencies a component has, the harder it is to seperate from the larger program. Components that cannot be separated cannot be tested by themselves, and are thus hard to debug and change. It's therefore always a mistake to add a new dependency to a component that is already working, and almost always an improvement in terms of structure to externalise the functionality provided by dependencies.

Dependency injection has made the situation worse. The 'D' in SOLID stands for dependency inversion, which is the idea that dependencies should be moved outside a class wherever possible. The unfortunately similarly named notion of dependency injection is the idea that this is achieved by constructing the dependencies outside of an object and passing them in. That removes a dependency on a constructor and maybe a specific instance, but does not really remove the dependency from the object at all. In fact, it guarantees as we move up through the layers of the program, the number of dependencies only increases until they are utterly unmanageable without the aid of an automation program - a depedency injection framework. Users tend to interact with the upper layers of a piece of software more than the lower layers, so this also means that the parts of the software that users are most likely to interact with are also those that are least manageable from a testing and debugging perspective.

This problem can be fixed. The aim is not to inject dependencies but to move them outside and remove them entirely wherever possible so that each component does fewer things and needs less of the wider program in order to demonstrate its function. The ideal component can be constructed with no parameters, instructed to perform its task with a single API call producing an output that is easily compared to what is expected.

Passing in an object and sending it input is not the only way that two components can communicate. Every input for a dependency could instead be provided as an output for a component. A component that generates data to be stored in a database does not need to be given a database as a dependency. It can instead produce that data as an output. If the database is injected as a dependency, such a component can only be tested if that dependency is also present in some form. If the connection is made outside, the component can be tested in isolation, so it is clear when a failure occurs if it is the fault of the data generating component or the database itself. Modern languages often have libraries that make this easier to achieve, like LINQ and Rx in C#.

We've a tendency to normalize the pain points when it comes to software, particularly in environments where certain techniques are considered the 'correct' way to do things. Tests should not be hard to write if it's possible to make it easy to write them. It should not be necessary to page through multiple files to try to understand why a particular feature works the way it does. There's no inherent reason why software should be built in a way that requires a whole other piece of software just to work out how to stick it together in order to successfully start.

We often make problems worse for ourselves by failing to acknowledge their existence. Large software projects are often continuously built and changed over long periods of time, and this allows small problems room to grow into very large ones. Too much interdependency is just such an issue: seemingly a small problem when looked at in isloation, it's an issue that can grow into something that destroys the structure of a project and makes it exceedingly difficult to understand, test and maintain.