Fewer dependencies with streams

I've felt for a while that dependency injection is a problem design in modern software. It's not so much anything to do with the idea or the implementation but that the problem that it solves really shouldn't exist in the first place. It's something that's only really useful once a piece of software has become so tedious or difficult to initialise that the whole process requires automation. Dependency injection can become a way to hide a spaghetti design underneath a layer of automation.

The notion of test mocks has a similar issue. They're often used as a worse way to declare classes, but they're also used to test if some piece of internal state is modified in a particular way. It seems that if there's a need to observe a piece of state, it should be possible to do so without needing some kind of fake intermediary class in the way.

These techniques are fragile: not guaranteed to cause problems, but code drift over time and carelessness can easily result in large pieces of software starting to exhibit issues more and more.

The issue seems to stem from the notion of objects that communicate through method calls. For instance, in a logging system, we might use a Logger object and we might send some data to the log by calling Logger.Debug("blah"). To observe what our logger is doing we need to implement a new version of it that puts the log messages aside somewhere (and we need to define what that 'somewhere' is). To add a logger to an object, we either need to instantiate it statically, meaning our observe now needs to also exist statically, or every object that either logs a message or creates another object that wants to log needs to have a reference to the logger added to its constructor.

As the code base grows the original choice of logger class becomes more and more entrenched through the codebase. It becomes extremely onerous to change it at all.

If we start adding in other 'universal' services: user-interface pieces, the console, HTML and HTTP handlers, etc, etc we can rapidly approach a situation where every object has a huge list of 'standard' parameters in its constructor, or where we decide that these things are universal enough that dependency injection is not a pragmatic approach.

An alternative to methods

There's an alternative way for two components to communicate other than invoking methods on one another. It's possible for a component to simply generate an output stream that can be picked up elsewhere. For example, an object could provide a way to retrieve something like a Stream<LogMessage> instead of invoking the logger directly. This object does not need to know about the precise logger implementation, and this object's logging behaviour can be tested directly by reading from the stream with no need for an intermediary.

One less dependency to inject and simpler, clearer tests would seem to be a win all by itself. Better yet, streams can easily be filtered or transformed. In our logger example, say a particular set of actions are performed on behalf of a user. A simple filter can be used further up the hierarchy to add this information to the log, rather than needing to somehow distribute this information downwards. Similarly, it's much easier to integrate logs from other sources or change their format when the messages are presented as a stream rather than via an object.

None of this is particularly new. UNIX has used the stream as the primary means of communication between processes and the kernel since it was first designed. Smalltalk, one of the original object-oriented programming languages used messages instead of methods (making a Smalltalk object essentially the recipient of a stream). Rx is essentially streams with a different name and a certain amount of extra baggage.

These larger systems that are based around streams show off one of the less obvious but perhaps most useful benefit of designing systems that are connected together through streams rather than objects: by reducing the number of 'hard' dependencies, large systems can be built by connecting small programs together. Small programs can be more easily maintained, and the overall form of the system is easily mapped through its connections.

There are some difficulties with streams. One is particular to the logger example: what if nothing is listening? That would be a critical problem for a logging subsystem as messages could go missing, but it has a potential solution from UNIX: the standard streams (stdin, stdout and stderr) can be redirected but still go to a default location when this is not done explicitly.

Another issue is that there may be poor support for anything beyond character streams in a lot of languages. More modern languages are addressing this, fortunately. Streams of objects and usable structures are much preferable to characters that have to be parsed before they can be used.

The notion of building large systems out of small programs connected by streams can be one that produces a strong software structure. However, it's possible for this structure to become lost and for a piece of software to instead become a large program implemented with streams. There are still benefits (better testability remains, for instance), but maintenance becomes much harder when a strong separation of concerns is lost.

Conclusion

Streams are an almost forgotten tool that enable a different form of dependency inversion. For objects that produce an output or process an input, doing so in the form of a stream of data removes the need for those objects to have any kind of direct dependency on the corresponding producer or consumer object. What's more, it's much easier and more natural to transform a stream through a pipeline of objects than it is to use a forest of decorators to achieve the same effect.

It's much easier to test code that uses streams; a test can be expressed literally as an input and an expected output. A place where it appears that a mock is needed to check for a callback is perhaps a place that's calling out for an output stream to be used instead.

A program that produces a stream as an output can in turn be used as input for another program. This seems like a small benefit, but it's a way to take two small programs and combine them into a bigger system without one developing any kind of direct dependency upon the other. Keeping small programs small in this way makes it possible to build large systems with many fewer maintenance headaches.