Modern spaghetti
We are still fairly bad at writing large pieces of software. The style of this kind of software hasn't really changed over the
50 years or so that we've been writing it: as it grows and more and more people work on it, the structure slowly goes
fuzzy around the edges and eventually collapses entirely into a pile of tangled spaghetti. We constantly try to invent
new techniques to work around this problem and it seems almost inevitable that each one becomes the source of the
problems.
Spaghetti code is code that has become hard to reason about because of how it interrelates with itself. Dijkstra came
up with an interesting definition for such code in his infamous letter about gotos: code whose state is difficult to
describe other than by running the program with a particular input for a set number of steps. For a piece of software
to have a way to specify its possible states like this, it implies it has another quality: each of these parts can
be split out and executed independently of one another.
For large pieces of software, these are both important qualities. Being able to accurately describe states makes it
easier to narrow down the set of suspect code when a bug is found, and quickly reproduce it without having to follow
every action the user took. Being able to split the code into individual components makes it easy to reconfigure the
software to perform new tasks or existing tasks in a new way, or to replace them when they are no longer suitable
for their job. One of the things I've found with working with large software projects is that even now, very few of
them truly have these qualities.
We can feel this most profoundly by looking at our tests, or perhaps more accurately at the tests that have not been
written. Big pieces of software inevitably seem to have certain classes that require creating every other class in
order to instantiate them, or which need a whole database instance to work, or a huge pile of mocks. These classes
are seldom well-tested if they are at all because the tests take a long time to write and perform poorly. More
importantly, the huge amount of connectivity they have to the rest of the software means that it's nearly impossible
to sensibly test the entire gamut of their performance.
These classes are modern spaghetti code. In modern code, we have usually got state under control but recreated an age old
problem by making every single piece of code dependent on every other piece of code in some way: sometimes explicitly and
sometime subtly by assuming the behaviour of the rest of the application. It's often still impossible to properly
break down our software into components so we can reason about it and verify it: an inherent problem
caused by how we compose smaller programs into bigger ones.
The more dependencies a component has, the harder it is to seperate from the larger program. Components that
cannot be separated cannot be tested by themselves, and are thus hard to debug and change. It's therefore
always a mistake to add a new dependency to a component that is already working, and almost always an improvement
in terms of structure to externalise the functionality provided by dependencies.
Dependency injection has made the situation worse. The 'D' in SOLID stands for dependency inversion, which
is the idea that dependencies should be moved outside a class wherever possible. The unfortunately similarly
named notion of dependency injection is the idea that this is achieved by constructing the dependencies outside
of an object and passing them in. That removes a dependency on a constructor and maybe a specific instance, but
does not really remove the dependency from the object at all. In fact, it guarantees as we move up through the
layers of the program, the number of dependencies only increases until they are utterly unmanageable without
the aid of an automation program - a depedency injection framework. Users tend to interact with the upper layers
of a piece of software more than the lower layers, so this also means that the parts of the software that users
are most likely to interact with are also those that are least manageable from a testing and debugging perspective.
This problem can be fixed. The aim is not to inject dependencies but
to move them outside and remove them entirely wherever possible so that each component does fewer things and
needs less of the wider program in order to demonstrate its function. The ideal component can be constructed with
no parameters, instructed to perform its task with a single API call producing an output that is easily compared to
what is expected.
Passing in an object and sending it input is not the only way that two components
can communicate. Every input for a dependency could instead be provided as an output for a component. A component
that generates data to be stored in a database does not need to be given a database as a dependency. It can instead
produce that data as an output. If the database is injected as a dependency, such a component can only be tested if
that dependency is also present in some form. If the connection is made outside, the component can be tested in
isolation, so it is clear when a failure occurs if it is the fault of the data generating component or the database
itself. Modern languages often have libraries that make this easier to achieve, like LINQ and Rx in C#.
We've a tendency to normalize the pain points when it comes to software, particularly in
environments where certain techniques are considered the 'correct' way to do things. Tests should not be hard to
write if it's possible to make it easy to write them. It should not be necessary to page through multiple files
to try to understand why a particular feature works the way it does. There's no inherent reason why software should
be built in a way that requires a whole other piece of software just to work out how to stick it together in order
to successfully start.
We often make problems worse for ourselves by failing to acknowledge their existence. Large software projects are
often continuously built and changed over long periods of time, and this allows small problems room to grow into
very large ones. Too much interdependency is just such an issue: seemingly a small problem when looked at in
isloation, it's an issue that can grow into something that destroys the structure of a project and makes it
exceedingly difficult to understand, test and maintain.