Misunderstanding GDPR Compliance

I can’t believe I have to do this, but here it goes: I am not a lawyer. I have never practiced law. Everything you are about to read might be totally wrong. This is definitely not legal advice. I am a data engineer, so don’t take anything I say here too seriously. If you think something in this post might affect you, you should consider speaking to a lawyer.

With that out of the way, I can start. The General Data Protection Regulation, or the GDPR, is a few months from coming into force. I have spent a fair amount of time working on GDPR-related issues, and my conclusion is that it is an unmitigated disaster from a rule of law perspective. Either the text of the regulation doesn’t mean what it says, or virtually every company is about to find itself in violation of the law. Either way, it’s bad.

But this post isn’t intended to be a general complaint about the GDPR. Instead I want to make a narrower point, namely that most companies involved in data processing are woefully unprepared for the GDPR. I also want to talk about Google Analytics.

A Quick Overview of the GDPR


The core of the GDPR is pretty straightforward. For our purposes, a company can only “process personal data” if the data subject has given consent for the specific purpose in question, or if the company has a “legitimate interest” in doing so. The concept of legitimate interests is exceptionally stupid and never should have been included, but there it is.

Regardless of whether personal data is processed on grounds of consent or legitimate interests, the company must provide data subjects with the rights covered in Chapter 3. Specifically, a data subject can ask the company to delete the data it holds about him or her, can request access to the data it holds about him or her, and can have the data transferred to another company. There are a number of other rights as well, but the gist is that people have control over their personal data.

So what is personal data? Here is the full definition from the text:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

Not helpful? Huh.

There is plenty of debate about what this means, but let me establish some uncontroversial facts. One, what matters is whether the individual can be identified, not whether the individual has been identified. Two, each piece of information on its own might not be “personal data”, but when combined the data becomes personal.

Slightly more contentious, although I don’t think by much, is that it doesn’t actually matter if you know who the individual is. As Twilio puts it, “What matters is that the information can be used to pick that user out of the crowd even if you don’t know who that user is.” This actually makes a lot of sense, as otherwise the rights in the GDPR would be kind of meaningless. A company could know everything about you other than your true identity, and you would have no rights over that data; it only becomes “your” data if you are willing to disclose your identity.

This leads to what I call the “If You Can, You Must” test. That is, if you can provide GDPR rights to data subjects, you must. If you store data about a logged in user, that logged in user should have the right to request deletion of that data. Why? Because you can. It is personal data because you know it is about that user; you can comply with the request, so you must. If the data you collect cannot be connected to a user, you can’t identify a user that has the right to request deletion. You can’t, so it probably isn’t personal data.

What Does “GDPR Compliant” Even Mean?


A lot of companies are in the business of providing data processing services to other companies. Usually it is some form of “data collection and analysis as a service”. What is peculiar is that these companies seem to always make the claim that they are “GDPR compliant”.

GDPR compliance isn’t that easy. It isn’t a matter of whether a company “is compliant” or not. The best a company can do is say it offers the tools that allow other companies to use the service in a manner that is compliant with the GDPR. But companies don’t say that. They say they are “GDPR compliant”. When non-lawyers hear that, they think it means they can continue using the service in question without a problem.

By way of example, imagine some crazy-evil Sauron as a Service company, Bad Guy Inc., that collects data about everything people do online. If you visit a website that is using this service, Bad Guy Inc. will record everything you click on, everything you see, how long you stay on each page, etc., and will associate that data with a user id that is stored in a cookie. However, Bad Guy Inc. claims to be “GDPR compliant”. How can that be? What Bad Guy Inc. means is that its service could be used in a GDPR-compliant manner. A company could ask users, “Is it okay if we spy on you?”, and tell Bad Guy Inc. not to do evil spying stuff unless the answer is yes. A company could give users the right to delete the data the company holds about them, and Bad Guy Inc. would delete the data if so requested by the company. You get the idea. Bad Guy Inc. isn’t inherently in violation of the GDPR. It all depends on how the service is used.

The important point is that when a data processing company claims to be “GDPR compliant”, it really just means, “Yea, we could be used in a manner that complies with the GDPR, but it is totally up to you.”

Google Analytics


We really need to talk about Google Analytics. Just try searching “Google Analytics and the GDPR”, and you will find that most answers say that Google Analytics is totally fine under the GDPR provided you are not storing personally identifiable information in Google Analytics, which is against the terms of use anyway. But is that correct?

Google Analytics stores data about each website visitor, and associates that with a client id that it stores in a cookie. Go to any website that uses Google Analytics, open developer tools, go to the console, and type ga.getAll()[0].get('clientId');. It will print your client id for that website. You can even do it on this website! Try it now.

Okay, so we are definitely storing data associated with a client id (an online identifier). So by the “If You Can, You Must” test, you should be able to request that I delete the data that I hold associated with your client id. But that isn’t possible in Google Analytics, at least not yet (as far as I know). Uh-oh.

But okay, maybe the “If You Can, You Must” test isn’t a real thing and it only matters if I can match you to your real life identity. In many cases I can! Look, this website isn’t getting a lot of traffic. Only a handful of people will ever read this, and odds are those people also liked my tweet on Twitter. If @KormanBob liked my tweet and I see that only one client id is associated with Dallas, Texas, I have my man. That client id is my dad.

Maybe that example is ridiculous. Let’s try another. A person visits my site. Google Analytics tells me that this person lives in a weird part of Kansas where only 20,000 people live. Google also tells me that this person is using an uncommon mobile device. What if there is an online community for people with this uncommon phone, and there is only one member that identifies as being from that place in Kansas? I am definitely getting warmer. You might object that this isn’t information I collected, but it doesn’t matter. What matters is that I could identify this user, and probably many more users. If I have your rough location, device type, and what pages you visited and when, in many cases I can use that data to identify you.

I don’t really know how Google plans on arguing that this isn’t personal data.

Everything Is Bad


So that is bad news, huh? Those companies that have been saying they are GDPR compliant really just mean, “You could use our service and still comply with your GDPR obligations.” That means finding a legal justification for collecting the data and providing the full range of GDPR rights to data subjects. As for Google Analytics, the situation is even worse. It is not clear to me that Google really is prepared to comply with the GDPR. Even if Google does get on it, companies are going to have a lot of work to do over the next two months.

In short, everything is bad. The end.