"If You Don't Like It, You Can Fix It Yourself"

When you use an open source product, before you start, you silently offer up a prayer to the Gods. "Please, please, please", you say, "let this piece of software work smoothly. Don't put me in a situation where I'm forced to send a bug report. Please, don't make me do it". Because if you do send a bug report, sooner or later, some bastard is going to tell you that in fact you are the one who should be fixing it. "If you don't like it, you can just fix it yourself". And with that, any discussion about whether in fact the bug is a serious one that should be fixed is apparently ended. There's no comeback to that, right?

Well, here I very seriously beg to differ. My contention: if you write a piece of software and distribute it over the internet, you have a moral obligation to fix the bugs, not the people who download and try to use your software. If you're not prepared to live up to that obligation, you should simply not bother in the first place.

A quick definition: "bug"

There are varying opinions in the software development community as to the difference between a "bug" and a "feature". The reason is that a "bug" is something you have to fix whereas a "feature" is something you don't. It follows that the tighter the definition of "bug", the easier life is for programmers, whereas the wider the definition, the easier life is for users.

As far as I'm concerned, software is there to serve users, not programmers. So my definition of "bug" is correspondingly wide. I consider a bug to be any situation in which a program deviates from the expectations of the user. Users don't expect programs to segfault, so segfaults are bugs. Everyone agrees that segfaults are bugs. Equally, however, users expect emails with HTML content to look as they were intended; not to appear as a mess of raw HTML. Just because you haven't gotten around to implementing HTML rendering in your mail client yet doesn't mean that it isn't a bug. Just because it'll take you six months of solid work to fix the "bug" doesn't entitle you to call it a "requested feature". It's a serious problem that hurts your users. If it's really going to take you six months to fix, you'd probably better get cracking.

Half-good software isn't half as good as good software

Programmers are used to the idea that there's a law of diminishing returns when developing software. The first 90% of the features take maybe 10% of the time. Getting software to work "most of the time" only takes a tiny fraction of the time it takes to account for all the hairy little edge cases. When you start a project, you feel extremely productive. In fact, you seem almost god-like. In the later stages of the project, your energy and your confidence are battered and bruised by an endless sequence of annoying, progressively tinier bugs, each of which takes longer to fix than the last. You don't feel very productive working on these. Before too long, you get the urge to do a massive rewrite, to ressurect that wonderfully fresh feeling of churning out reams of brilliant code without breaking sweat.

However, this is not the only non-linear relationship that governs the usefulness of software. The other non-linearity is the relationship between the percentage of cases where the software works and the productivity of the end users. How productive is a user using software that crashes 50% of the time? Are they 50% as productive as they would be with software that never ever crashes? Of course not. Non-trivial uses of software involve long sequences of commands in order to reach an objective. Usually, every one of those commands needs to be successful in order to reach the objective. If any one of those commands fails, the entire task fails, and all the time used up until that point is wasted.

If your software fails very rarely, maybe one in a thousand operations the user tries, I'm sorry, but users are still going to notice. More than just "notice". They'll remember. It might only wreck the user's work once; but the user will instantly conceive such an aversion to using your software, they'll probably never use it at all. They remember the failures more than they remember the successes.

The 99% Rule

Since cleaning up those last few bugs is such a horrible chore, the software development community has come to a silent agreement with itself. It's decided to pretend that ironing out bugs isn't just hard, it must be physically impossible. This is a nice little getout for programmers who just can't face the prospect of going into work Monday morning and ploughing through source code trying to fix that tiny little bug which surely isn't that important anyway. "All software has bugs" is the slogan. I call this the "99% rule": all programmers implicitly aim for a situation where the code 99% works. Past that, they know there aren't any real rewards to be had. If all software has bugs, 99% should be plenty to pull past the competition.

Here's the problem: IT is getting more complicated. 20 years ago, you ran a piece of software, and it was entirely it's own master. It took over the whole machine. It was designed to know which chips were in which sockets on your motherboard. But nowadays, there are thousands of operating system components, daemons, libraries, utilities, toolbars, servers, drivers, and so on involved in even the most trivial of activities. Software is interdependent as never before.

For example, writing a text file such as this one doesn't simply rely on Emacs working correctly. To start with we have X, a window manager, a font manager, a terminal emulator, and the usual GUI infrastructure. GNOME is running ORBis, Dbus, and all kinds of miscellaneous daemons. In the background I've got an MP3 player, piping into an Enlightenment Sound Daemon, tunes served over an NFS server, and I'm hoping that nothing too ugly happens there which might lock up or thrash my machine. I also happen to be editing this file on a different machine to the one I'm sitting in front of: so there's all the various components involved in making ssh work, plus the operating system on that remote machine. Meanwhile, since I'm running Knoppix, there's some fantastically complex piece of software dynamically spinning up the disk drive, decompressing bits of data, and feeding them into memory whenever Knoppix needs a new piece of software. I don't even want to think about how fragile that process is, or how many individual components it involves.

And if one, if any one of those pieces goes wrong, where does that leave me? Screwed, that's where. I'll probably have lost this entire file, which as of right now represents about 20 minutes of my time.

If everyone who writes all this software is working to the 99% rule, what are my chances of getting all the way to the end of this essay without it getting trashed by some catastrophic failure? Once I get to about 100 systems or so, all of which have to work perfectly in order to get the job done, it starts to look pretty unlikely indeed. And when you start to add up all the pieces of software running on my machine right now which could conceivably crash my system, it easily gets past 100. That number is only going to climb higher as the industry develops.

The conclusion from this is, there is simply no room in the IT industry, free or commercial, for software that continues to aim for 99%. The chain is only as strong as its weakest link, and the strength of today's typical link simply will not stand up to the stresses of the systems currently being developed. We have to get better at this.

Bugs should be fixed before they hit the user

The above two points just sound like a "bugs are bad" rant. I haven't said why this lets users off the hook. It just means that users have that much more incentive to help fix the bugs, right?

Wrong. I'm not just saying that bugs are bad. I'm saying that bugs are so bad, if they hit the user, it's already too late. The consequences of releasing software with bugs out into the world is in fact so disastrous that it must never be allowed to happen in the first place.

Don't believe me?

The calculus of wasted time

There are six billion people in the world, almost all of whom have no access to computers whatsoever. This is going to change. The pool of people who use computers is still increasing exponentially (not least because so is the population of the planet). Therefore, the number of people using any particular piece of software is also increasing exponentially. There are lots of users. So every bug that wastes users' time is wasting lots of time.

Let's think about the most trivial bug I can dream up: your software has two buttons in the wrong order. The "save" button should be to the left of the "save all" button; instead it's to the right. This is just enough to slow down most users by perhaps one second. They probably don't even notice that second. And as soon as they've used the software once, they learn where the buttons are, so it never troubles them again. One second, for each user, over the entire lifetime of the user. If there was ever a bug that wasn't worth fixing, this is it, right?

But do the maths. A successful open source project (you want to be successful, right?) might get used by a hundred thousand users. At one second per user, that's almost three hours lost human productivity. So clearly, if you spend two hours trying to fix that bug, that's still almost one hour of profit for the human race. Now think: is it really going to take two hours to swap those buttons?

There's a particular view, popular among programmers, that programmers' time is somehow worth more than lesser people's. Believe me, programmers are not so productive as all that. But if it makes you feel any better, imagine that all your users are programmers. With open source projects, this is reasonably close to the truth. Are you horrified yet?

Now imagine that the bug doesn't waste one second, but one hour. Or imagine that instead of one second, just once, it's one second every five minutes the user uses the program. Have you ever come across a bug like that? How much time do you think that bug is wasting, taking into account every user on the planet? How much time would it be worth spending to fix it?

How to spend time: bugs or features?

"Aha" the geeks say, "sure, I could spend those two hours fixing that tiny bug, but I could also spend them implementing a new feature, which would save people even more time. Surely it's better for me to do the latter?"

Well, no. Another conceit of programmers is the delusion that the world needs their new features. Indeed, sometimes it's backed up by the users themselves. I, for example, could really do with an easy-to-use backup system right now (or rather, it would have been great to have had it before the chain of events that led to me needing to use Knoppix to write this essay). Sure would be just neato nifty if someone would implement one of those. I've said so more than once.

The problem is, what I need here is an easy-to-use backup system that actually works. That is to say, not one that was developed according to the 99% rule. And the same goes for every other new feature you or I could possibly dream up. Whatever that feature is that you could develop in those two hours, how long do you think it's going to take you to get it to the stage where I can trust it? Two hours spent on a feature I use a couple of times, figure out is crap, and then never use again, is not two hours well spent. I'd rather have my one second back, thanks.

In fact, your feature is going to make user's lives worse, in ways that are so subtle they're really hard to nail down. For a start, your feature is going to worm its way somewhere into one of these chains of hundreds of interacting pieces of software I talked about. It's going to be the next weak link holding, say, Vladimir's work together. Vladimir has no idea why his life suddenly got so miserable, because the malfunctioning feature was requested by Vishnu. Poor Vladimir doesn't even know which software package caused the problem. He's certainly not going to be submitting a well thought out bug report. It's just one more random snafu, lost in the background noise.

The other major downside is that new features make software more complicated. The more complicated software gets, the harder it is to use. The harder software is to use, the greater the number of people who will be using it wrong, and getting themselves into trouble as a result. You'll never be able to track the cost of implementing this feature of yours, but it's there.

Oh, and by the way, you did include the time it would take to write documentation for your feature in those two hours, right?

Diminishing returns

Here's an analogy that might help. Consider the human genome. We all know it's pretty buggy. The retinas are wired backwards, legacy features such as the appendix occasionally throw serious exceptions, the birth canal is so bent out of shape it often causes a full system crash. You can fill out your own list.

Now consider any random change you might make to that genome. What are the odds of any given mutation producing an improvement? Even with an "intelligent" designer at the helm? The genome as it stands now is the product of millions of years of trial and error. Practically every alternative has already been tried. Every variable has already been tweaked to the absolute limit of performance. So while yes, we could potentially climb just slightly higher up that evolutionary ladder, it's a hell of a long way down.

I'm not suggesting the current state of IT is anywhere near so refined, but we have been at it a fair while now. There's really quite a lot of software out there, and people are getting their work done. So while the invention of the wiki did make it easier to publish web pages, and millions of people are using wikis every day, those people wouldn't just be sitting around twiddling their thumbs if the wiki had never been invented. They'd be publishing their ideas only slightly more slowly using plain old HTML; or failing that, indulging in some completely different, but almost as productive activity. You can't judge the value of a technology by the amount of value generated while using it, but rather against the incremental improvement in productivity it produces. That increment always turns out to be much less than you'd expect.

Did you know that mediaeval monks had their own internet? Their packet format was the "book". Their local area networks were neighbouring monasteries, but these were internetworked with some pretty damn sophisticated routing algorithms. Their networks were multipath and had a high level of redundancy. They had checksums, reliability metrics, and redundant storage. The system was fantastically successful: it's thanks to their efforts that the foundations of western knowledge were preserved through the "dark" ages. How else would the learning of India propagate to a backwater like Ireland?

So try not to get carried away with the importance of your exciting new technology. It's not that new, and it's not that exciting. Especially if it's going to be full of bugs.

The value of experts

But I digress. I was responding to the idea that users are somehow responsible for fixing bugs. I think the most important thing to think about here is that users are completely the wrong people to be working on software.

Consider the partitioning program parted. Its job is to shuffle all the bytes on your disk around in different ways, so that you've got free disk space in the right place. One foot wrong, and parted will trash your entire hard disk. It's a very, very scary program.

So I'm just an ordinary Joe. I'm running parted, and I notice a bug. "Hmm", I say to myself, "that's annoying. I could fix that". And so I dive into the code. I've never seen the parted code before, but I'm a trooper, I do my best. Miraculously, I manage to find the bit of code that was annoying me, and change it so that it does what I want. I send the patch off to the developers, pat myself on the back, and float off, never to be heard from again.

Do you really want to be running a program as scary as parted with contributions like that inside it?

Parted is just an extreme example, but the same thought occurs for just about everything. I want software I can trust. Trustworthy software is not easy to write. The best way to make it is to have developers with a deep insight into the problem, who have the time to think up the very best plan for solving it. I want the software to be written by people who know the full implications of anything they change. I want them to be disciplined; I want them to make their own rules, and stick to them. Allowing random passers-by to contribute their own code eliminates any chance of sticking to a good tight, disciplined design. It's fatal to good software.

The other reason it's a bad idea is that it's just inefficient. Whenever you dive into a significantly sized program, you need a good long rampup time. I usually find it take me months to figure out how to implement a new feature without horribly breaking everything.

So as a free software user, how am I supposed to work? Am I supposed to drift in and out of different software packages, always tackling the bug that annoys me most at the time? Do you really want me to spend two months learning a new software package, just to waste all that knowledge when I move on to the next bug? This is a stupid way to work.

The only model that makes sense is for me to choose just one project, and contribute to that. That way I can use the knowledge I build up time and time again, implementing fixes faster and faster as I go. Eventually, the most efficient use of my time will be to fix bugs that don't even trouble me at all. I'm now the expert; the person best able to fix them.

But when I do that, I can't be working on other software at the same time. I'm now in a position where I rely on the developers of those packages to fix their bugs on my behalf. But I've noticed that when someone complains about a bug, it's always the same old story. No-one asks "have you fixed any bugs in any other software packages?". They jump straight in with "if you care so much, why don't you fix it?". They're breaking the contract.

Worse than nothing

Here's another controversial contention for you: it's possible for the world to be better off without a piece of software entirely than with it. Yes, even free software.

This is trivially true for software viruses. But it's also true for projects that are started with the best of intentions. Consider the cumulative effect of the following factors:

Putting all that together, I think it's clear that there's an awful lot of software available right now that is actively making the world a worse place to live in. Just taking Linux audio software alone, I'd say that two thirds of it at least falls into that category, for all of the reasons above.

So if you're working on a free software project right now, I offer you the following suggestion perfectly seriously: give up. Really. Consider putting your energy into supporting a different project which has a better chance of making the world a better place. As a rule of thumb: if you're spending more than 10% of your time implementing new features, rather than fixing bugs, you're probably working on the wrong project. Find a more mature alternative, and work on incorporating the good features of your version into that.

Moral responsibility

So yeah, bugs are bad, and it'd be great to fix bugs. But I'm going further here. I'm claiming that there's a moral imperative to fix bugs. I'm saying it's immoral to distribute software which has bugs in. How do I back that up?

Say you have a big old stockpile of pills sitting around in your apartment. If someone takes one of these pills, 90% of the time they have a great evening. The other 10% of the time they die. My question is this: do you think it's morally acceptable to stand on a street corner handing them out to people?

Do you think the answer changes if you sell them cheaply enough?

During my attempts to restore my broken hard drive, when I mentioned that I thought there should be more safeguards on what people can do with parted, I was told that users should take responsibility for their actions. Well, I'm saying that programmers should take responsibility for their actions as well. If you hand someone a weapon, and you know they're going to hurt themselves with it, you do indeed share the responsibility when they do so.

This is not the view put across by the open source movement. As far as they're concerned, once they've put the disclaimer on the website, all responsibility ends. "It was your decision" they say. "I didn't make you start this whole thing. You should learn to take responsibility for your own actions. No, no, don't come crying to me, it's your problem now. Go away, I'm busy". So they wander off, muttering to themselves about the "losers" who "can't hack it". And they sleep well at night. Just like any drug dealer.

As a society, in the real world, we've pretty much decided that we don't work this way. We've learned that people are not always rational, and they don't always read the warning signs. People can't always accurately judge what is good for them, particularly when they're not experts. And so we try to make sure that the consequences of common mistakes are not too serious; even when that limits our freedom. And when people act stupidly, and get hurt, we recognise that they were just being human too. The people we single out for punishment are the responsible professionals who knew what was likely to happen and did nothing.

With software, I believe it is simply not morally acceptable to release software on the assumption that anyone who uses it without reading the manual thoroughly, making the appropriate backups, and leaving a reasonable amount of time for the operation to complete, deserves whatever punishment they get. They won't. You know this, right? I mean, you've watched real people use software? They just don't behave that way. They dive into it, completely reckless and with absolutely no idea what they're supposed to be doing. These are not isolated cases, this is how human beings work. It's not reasonable to simply ignore human nature. At some point, you're going to have to face the possibility that most of those six billion people in the world are never going to learn how to use all the failsafes, no matter how many times their hard drives are wiped.

When you realise that the world is not going to use your software the right way, and consequently they're going to hurt themselves with it, you are suddenly, magically, morally responsible for the fact that they do. Yes, you. Sorry.

Why complaints shouldn't have to be reasonable

A common complaint heard from software developers is that bug reports are of such low quality. Users can't say exactly what they were trying to do when things went wrong, they make wild accusations of bugs in completely unrelated software, they rant and flame, and then they expect the programmers to just jump to it and implement bizarre new features. What bastards those users are!

Well, get used to it. You can't fix bugs if you don't know the bugs are there, and the only way you'll figure out what bugs are there is by listening to the complaints of real users. Not the sycophantic geeks who subscribe to your mailing lists and know every feature of your software backwards, because they're not the ones who run into the serious, time-wasting bugs. You need the feedback of the clueless newbies who have absolutely no idea what they're trying to achieve, let alone what commands they're supposed to use to achieve it. Those are the majority of your users, and they're the people who need your help the most.

The thing to bear in mind is, for each complaint you receive, a thousand other users have run into the exact same problem and said nothing. They've just gone without your help. If you help this one user now, and learn something about how your software is failing in the real world, you will be helping thousands of people and saving masses of time for the human race. This is worth investing some effort in.

That is, it's worth you investing some effort in. You're the one responsible for the software. You're the one who's supposed to care; the one who apparently cared so much that they wrote the software in the first place. For the user, it's a very different story. For them, the software is only a tiny part of their life. They really don't care about your little widget. They have better things to do with their time than spend a week emailing some random geek on the internet about obscure technobabble they don't understand.

Far from being the high priest asking for sacrifices from the supplicants, you should be the door-to-door saleman, summoning every ounce of charm at your disposal to try to persuade people to please let you help them. You should be the one investing time in the discussion, trying to make it as smooth as possible for the user, so that you can find that bug at the end of the process.

This is precisely the reason why open source software notoriously just doesn't work. Only experts have a channel to the programmers, and experts rarely make mistakes. Therefore, the software evolves to not tolerate mistakes or ignorance at all. And as the software grows more complex and more interdependent, it becomes more and more unlikely that anyone could possibly be expert enough in the entire stack to make the whole thing work.

Conclusion: what is IT for?

We programmers should not be on a mission to turn the world into copies of ourselves. There are so many of them, and the world really doesn't need that many more us.

Software is there to turn complicated problems, that you have to think hard about, into simple problems that you don't have to think about at all. It's not about doing it more quickly, or more thoroughly, but doing it with less thought. Human thought is the most precious resource we have, and the most wasted.

Every trend is making it more important that software works, always, under all circumstances. We have more software to work with, it interacts more intensely, we have more complicated tasks to accomplish with it, and there are ever more of us to have our time wasted when it all goes wrong. It's crucial that we lift our game.

And, my last point of all: we can do it. Think about it. Doesn't the adage "all software has bugs" seem rather defeatist to you? Does it really gel with what you know about how computers work? And do you really think that we, collectively, are doing everything within our power to stamp out those last few bugs? Don't you think we're selling ourselves short by merely aiming for "better than the competition"? Take a look at the competition. Beating them doesn't seem much of a cause for celebration to me.

Matthew Exon
Last modified: Wed Sep 13 11:48:13 CEST 2006