“Why build this ourselves when we could just use [insert third party solution]?” This is a pretty common question that almost every programmer has heard at some point. I also think it’s unfortunate, as the distinction between “building” and “buying” is never quite as clear as the question implies.
In this post, I want to talk about a number of ideas that are loosely related to this “build vs buy” question. In the first section I provide a framework for thinking about whether to develop a custom solution or utilize an existing third party solution, and I argue that companies regularly make bad decisions in this context. The second section discusses why I think this problem is so common, and how it can be avoided. Finally, I argue that the “build vs. buy” question is really caused by software products being “too big”, and explain how serverless architecture (e.g. Google Cloud Functions and Amazon Lambda) might change this.
Most needs that a company has are extremely generic. A flower shop doesn’t need to write its own accounting software, although it does need to do accounting. A newspaper doesn’t have to invent the printing press in order to distribute the paper. These needs are generic; they apply to plenty of businesses, so there is no need for each company to reinvent the wheel.
Some companies have unique needs requiring custom solutions. Google’s search engine algorithm is just for Google; if Google replaced its algorithm with some other search engine, people would notice. The same is true for Netflix’s recommendation system. But this doesn’t just apply to giant companies. For example, the startup I co-founded in Texas wanted to recommend retail electricity plans to consumers, and as a result we needed a way of automatically “reading” all of the plans (from PDFs) and converting them into structured data. I built that myself, because there really was no other choice.
But life isn’t always this easy, and sometimes companies have to think carefully about whether a need is unique or whether an existing solution will work. The task is actually more difficult than that, because there is some unique need in almost everything, and the question is how much that unique need is worth. This is where companies seem to make terrible decisions. So, in an effort to make the world less terrible, I want to go through some of the good options, as well as some of the bad ones.
One good option is to look at the costs of building a custom solution, decide that the benefits of doing so are not worth it, and opt to forego the benefits and settle on a third party solution. The key here is to acknowledge that there are benefits lost, but that those benefits do not outweigh the costs. This is, in effect, a decision that the company does not need to do that custom thing.
Another good option is to do the same analysis, find that the benefits do outweigh the costs, and move forward building the custom solution. The business has a custom need, and the profit maximizing decision is to develop a custom solution. That’s great too.
There is a third option that is also fine if done correctly. The company has a unique need, something that it simply must do that there is no good solution for today. However, there is a product that does something similar, and the company can opt for that solution and then have engineers hack on top to produce the custom thing that is needed. By doing it this way, the company saves on cost. This is a treacherous choice, though, as predicting how much hacking is needed can be quite difficult. However, there is nothing in principle wrong with this.
To illustrate these choices, imagine a news website that wants to show visitors stories specific to their interests. The job the editors do is clearly custom to the company, as the news site is meant to have its own distinct voice. A successful personalization system should complement the editors, so if the editors are custom then the system needs to be custom as well. As such, the choice is really between the second and third option here. Expressed in terms of a Wardley Map, this is what those two options look like:
(sorry about the size, one day I will rewrite this website. In the mean time, click to view.)
There is nothing inherently wrong with either option, although my personal experience tells me that the custom solution is better in this particular case.
However, there is also a fourth option, which is extremely bad. The fourth option looks a lot like the third option, but the difference is that no one really thought about it in advance, and management isn’t actually aware of the big red line representing engineering time. Instead, someone asked the question that appeared at the top of this post, “Why build this ourselves when we could just use [insert third party solution]?” The implication is that the exact product already exists, with no trade-offs required. Some third party software was selected, but management still expected it to do all of the custom things despite never really knowing how hard or easy it might be to make that happen.
The fourth option is the worst of every possible world. A decision is made to choose some third party software because it is cost-effective to do so, but then an enormous amount of development time is spent on workarounds to make the software do what people want it to. When this works out, it is purely by chance. Under normal circumstances, the result is less than stellar and the costs were just as high as building a custom solution.
The problem I described above, where management decides to utilize some off-the-shelf system to save on costs but then ends up spending more resources trying to hack on top of the product, is extremely common. But why is that the case? Why do so many companies make this mistake?
I don’t have a great answer, but my guess is that the costs are hidden. When you decide to build a product “from scratch”, you have to allocate some budget to a team, making the costs extremely visible. When you pay for a product, you see the cost of the product itself, but the programmers that build workarounds to make the product function properly are often not allocated specifically to that task. It’s a side thing: each workaround is small enough to never warrant mentioning, but the cumulative effect is thousands of hours of engineering time that go largely unnoticed. I don’t have have any quantifiable evidence of this, but anecdotally that seems to be the case.
In the extreme case, that actually makes it more likely that management will make the same mistake again in the future. As developers spend more time hacking on top of third party solutions that are poor fits for the company’s perceived needs, they fall further behind on the main projects they are meant to be working on. Management, seeing the delays, will grow more skeptical of future development efforts, and therefore will have an even stronger preference for third party solutions in the future.
If anyone has any links to academic studies discussing this, please send them my way.
The above discussion was primarily about how we should think about “buy vs build” decisions. However, it’s kind of unfortunate that we are stuck with that decision at all. Ideally we wouldn’t be asking whether we should “buy or build”, but rather we would be asking which parts of a system we should buy and which parts we should build. So why is that not normally possible?
The problem, I think, is that existing software products and services are way too big. They try to do too many things. This isn’t the fault of the software companies; in today’s world, this is what people want. However, it is also inefficient.
To return to the example of a news website wanting an automation or personalization system for content selection, we’ve already established that there has to be something custom about the system in order to match the custom job done by the editors. However, if we zoom in on that system, we would see that only a tiny fraction of it needs to be custom. The vast majority of the system, or of any system really, is just standard work that applies to hundreds of other companies. The system needs to collect data from the frontend, pass that data on to a server, store it in a database, and make it available in real time. The system also needs a handful of different models for recommending content, and not all of those models are going to be unique at all. In addition, the system needs a way of putting those recommendations onto the page for the user.
Having worked on such a system, I can tell you that almost none of that is unique. The unique part is simply how we combine the components. However, when we looked at the possible third party providers of such systems, we realized that we couldn’t pick and choose. We couldn’t say, “We will use your data collection system, our data processing and storage components, some other third party’s models, and your rendering system”. That wasn’t an option. It was all or nothing.
And that creates another problem: Even if you’re happy choosing “all” in this particular case, you now can’t expand the system to serve other needs. For example, if you use a third-party personalization system, you’d need a different system to collect data for analytics purposes. There are, of course, third party solutions for that as well (think Google Analytics). However, then you’re left with two trackers and two systems when you really only need one. Each new need comes with a new system.
In an ideal world we could pick and choose the components that we want, so that we would buy the generic components and utilize them to support the custom components we build ourselves. I think serverless might make that possible in the future. With a marketplace of pay-per-use functions, products become smaller. Software companies would no longer need to offer full solutions, but rather could focus on building functions for solving common problems. In a world in which “software as a service” is replaced by “functions as a service”, the buy vs. build is going to look very different.