We should do more specification

December 2, 2021

This is the start of a series of posts dealing with software specifications. About why they are useful but rarely used where not mandated today. But even more about what we could do to turn them into a powerful tool that enables quick iteration even for safety-critical software¹.

In this first post we’ll be talking about what software specifications are, why they are useful and why most projects still avoid them if they are not forced to use them. For this we’ll look at all the problems that plague creating specs and especially keeping them complete and up-to-date for a constantly changing software.

About specifications

It feels to me that software specifications (specs) are a rare thing these days, so I want to do a quick summary of what they are and how it’s like to work with them.. What do I mean by specs? The most important thing for our context is that they are written BEFORE the implementation. For example, if you were to add a tagging feature to a patient management app, you’d first write the change to the spec about how this feature will work: which interactions are possible, where the tags are displayed, and all the other details. Only after this is written down do you actually implement it. A second important aspect is that the specs I’m talking about are almost exclusively targeting an internal audience - or at least an audience that has a lot of context already. I’m talking about engineers, product managers, designers and similar roles. This is not about public documentation websites,guides, tutorials and similar types of software documentation.

Do we even want them?

If the prospect of doing this kind of up-front software specification makes you want to run away as quickly as you can while nightmares of endless waterfalls, releasing once a year and horrible resulting software haunt you - I get it. It does sound like the way we built software in past decades, a practice that we are all really happy to have left behind us.

So are there any benefits to these specs? Because if the answer is no we should just forget about them instead of figuring out how to make them work. Unsurprisingly, the answer is yes. There are benefits to planning ahead. By thinking about where we create risks with a change and what implications a change would have on other areas of our software we definitely can avoid issues. Not all of them, but some. Also, from a more pragmatic perspective, there are quite a few industries that just require this kind of specification. On the other hand there is definitely also a big benefit to releasing often and iterating quickly on software. For multiple reasons. It allows releasing smaller changes, therefore reducing risks and the potential impact of issues in a different dimension. It also allows for a faster adjustment to new insights and therefore a bigger benefit to the users.

So it could have a tremendous impact on how much we could help e.g. patients with software, if we can find ways to combine both. Quickly iterating towards providing as much benefit as possible while still planning ahead and making sure we don’t create any risks that are avoidable sounds amazing. But in order to be able to discuss how we could do better and achieve this - which will be the topic of the next post - we’ll first have to dissect the problems in a bit more detail.

Before doing so, a quick disclaimer: No, I don’t think that every little toy project or personal blog needs to adopt rigorous planning ahead. The same goes for early prototyping. I’m mostly focused on more complex software that has a larger impact in the sense that there are actual risks involved if something goes wrong. Risks that don’t just amount to someone losing a bit of money.

The first problem - duplication

Let’s go back to the tags of our patient management app. We start by changing the spec and add the missing bits. Our patient management app now has tags. A tag is a short bit of text. Each patient can have an arbitrary amount of tags associated with them. We display each of the tags next to the patient name in the patient search as a little badge and write the text into it. In the detailed view of the patient there is an input where you can remove or add links to tags as you want. In this input you also have the option to create a new tag and immediately link to it. Once this spec change is written we now implement the change described in the spec. We implement that we show all tags for each patient in the patient search - displaying them as little badges with the text inside them. We build an input into the detail view that allows the user to add and remove links to tags. And make it so that you can also create new tags right from this input that are linked right away. And so on and so on.

If you jumped over the last few lines because this is just a boring repetition of the stuff written above then that is exactly my point (If you didn’t I commend you on your patience). We are duplicating a lot of information between the specs and the software. One effect of this duplication is that to a lot of engineers writing this kind of spec feels like doing double work and a chore. And I think they are not wrong with this complaint - although I might be biased, being an engineer myself.

The second problem - synchronization

There is the old saying between engineers - “Sync is hard”. And for any complex system, be it a software or a team, there is a lot of truth to it. So at some point this old problem will rear its ugly head and the spec and the software start to grow apart and are not in sync anymore. Which very quickly leads to no one trusting the specs anymore which leads to everyone complaining even more about the need to write them in the first place because they are useless anyway. It’s a vicious cycle. One that you can only break out of by putting in a lot of manual effort to continuously comb through the specs to keep them up-to-date. Again - not work that most engineers would consider fun and also actually a quite challenging endeavor as soon as the software becomes reasonably complex. The more systems you have in your software the more places you often need to touch in your specs for every single change you are doing in your software. Because this effect compounds, the overall effort needed to keep everything in sync balloons out of control quite quickly beyond a certain point.

Problems2

As if that wasn’t enough, basically all the problems above become infinitely worse as soon as you want to increase the frequency of change. The minimal time to create a change in the software is just way longer when you duplicate part of the work. And especially for really short loops an additional constant amount of work can seriously change the dynamic. Also keeping two often-changing complex documents in sync manually sounds like a recipe for disaster especially once you add a growing team into the mix.

And lastly - almost as a side note to be honest - even if we somehow managed to get all this right and have up-to-date specs, the tools we currently use for handling them are not giving us all the benefit we could get from them. Not even close. Which makes it even less attractive to put in the work. There is an excellent newsletter by @hillelwayne who touches on some of these aspects and inspired me to finally start writing down my thoughts on this topic: Documentation could be so much better

The thesis

I think all of these problems contributed to the general aversion against planning that is pervasive in the software industry. I’m sure there are also a lot of other reasons. But having up-to-date specs puts an upper limit on your iteration speed, requires so much additional work to create and costs so much discipline to keep in sync. And that for sure didn’t help.

My thesis on this problem is that all of these problems are actually the result of a fundamental misconception. That misconception being that there is a conceptual separation between specification and the software in the first place. In my experience we often treat those two as different things because our tools are incapable of doing it differently, not because it would make sense to do so conceptually. So in the end it seems to me that this is the one actual problem to solve.

Now, admittedly this post turned into a bit of a rant. But since we now explored in more detail what doesn’t work currently, we can in the next post focus on how we could do better. So we’ll pull out all the stops and go full-on utopia. How would we want to work with specs if anything would be possible? In later parts of this series we’ll then look into some more concrete ideas how to work towards this utopia. We’ll also take a look at my - failed, so far - attempts at solving some of the technical challenges on the way there.

Also, as always: If you have thoughts, agree, disagree or just want to talk, head over to @reddish_flo on twitter.

ps: Please don’t read this as a dunk on my teammates. I think they are amazing people and great engineers. And I think we’re doing a good job of handling and mitigating all these problems. But even though we do, we do it at the cost of a lot of mental energy and manual work. And it sometimes makes me a little sad that they have to put their energy into these problems instead of using it to make even more of a difference for the patients and doctors using our software.

Safety-critical software is software whose failure or malfunction may result in death or serious injury to people, severe damage to property, or environmental harm. Common examples are planes, medical devices, nuclear reactors etc.↩

Reddish Florian