Democratize Data Science, They Said

I came across this post in the Harvard Business Review arguing that data science skills, and responsibility for using data, should be distributed across a company rather than siloed within a data science team. As the author explains:

These days every industry is drenched in data, and the organizations that succeed are those that most quickly make sense of their data in order to adapt to what’s coming. The best way to enable fast discovery and deeper insights is to disperse data science expertise across an organization.

Companies that want to compete in the age of data need to do three things: share data tools, spread data skills, and spread data responsibility.

I can kind of see where the author is coming from. If I were a data scientist at a company, I’d be extremely happy to hear that everyone who previously asked me for help suddenly developed data science skills and could now leave me alone. No more requests from product owners and no more sales people bothering me for numbers. Just bliss. I even wrote a similar post myself!

The problem is that most companies aren’t really equipped to make this work, and the consequences of failure are high.

The Myth of the Hyper-Functional Company


I have mentioned this before, but I will say it again: Everyone who writes blog posts seems to work for hyper-functional companies where everyone is extremely qualified, people get along perfectly, and there is never any fighting between departments. This isn’t reality. In reality, a company is just a collection of relationships between people, and most of those relationships are highly dysfunctional.

When people want to democratize data science within a company by spreading data responsibility around, that looks like the start of a war to me. Most data scientists have caught someone in a different part of the company doing something extremely stupid with data. Now imagine that the person doing the stupid thing with data had formal responsibility over how the data is used, and could just tell your data scientists, “Sorry, I think this is very smart, and it’s up to me to decide.”

Of course, one solution is to ensure that everyone in the company really knows what they’re doing with data and won’t do dumb things. The only problem is that everyone does dumb things with data, especially data scientists. The more data science you know, the more dumb things you can do. The hope is that if you have a data science team, the data scientists together will act as an ensemble method of sorts, and the output from the team will be not-dumb. However, that doesn’t work if Dwight from Sales gets to tell your data science team “no you’re dumb” because he took one online Python course and management wanted to democratize data science within the company. Dwight decides now.

Your Data Engineers Will Hate You


Your data scientists aren’t going to be happy when Dwight from Sales tells them they’re dumb, but just wait until Kevin from Accounting tells your data engineers what data he needs in order to fill his new role of “not actually a data scientist but you can’t say that openly”. Kevin doesn’t just tell your engineers what problem he’s trying to solve; no, he tells your engineers that he needs the data in real time and he needs it piped into Google Sheets.

Once the data engineers are done screaming in pain, they try to suggest a few alternatives. Maybe Postgres would be better? Nope, it has to be Google Sheets. And here’s the catch: management read a post in the Harvard Business Review about democratizing data science within the company, so your data engineers are formally required to give Kevin what he’s asking for no matter how stupid it may be.

This is a terrible way to run a company.

Data Dictatorship Isn’t So Bad


It’s great to have people in the company who can use data on their own, and maybe this idea of democratizing data science and spreading responsibility around can work at some companies. Most companies, however, are going to find this challenging. It’s too easy to do stupid things with data, and you need data scientists who can tell people no.

So sure, go help people in the company to build skills so that they can make use of data. Teach people SQL. Send some people to Python classes. Your data scientists will probably appreciate having more people in the company who aren’t totally useless with data. But don’t go balkanizing responsibility for the data across the entire company unless you’re confident you can manage that process successfully.