This piece was originally written for a tech company for the publication Dataconomy.
By now, any data professional worth their salt should know about the General Data Protection Regulation (GDPR). This EU law, set to be enforced in May this year, has sent shockwaves (and a fair bit of fear) across many industries and professions. But, perhaps the one area set to bear the brunt of the Regulation is data science.
It doesn’t take a data scientist to understand that GDPR is going to drastically change how every business stores, processes, transfers, and analyses its data. The Regulation is one of the strictest, and all-encompassing data laws to date. It governs everything from the storage and portability of data, through to consent around its use and who can access it. It also puts data control and overall ownership of personal data back into the hands of the individual. Thereby doing away with the grey area that previously existed in data ownership.
For data scientists, there’s good news and bad news.
GDPR’s impact on profiling
GDPR impacts data science across several different areas. Firstly, there’s limits imposed on the ways businesses profile customers and process personal data. Depending on how you define it, that’s a huge part of a data scientist’s job role. Under GDPR, profiling is determined to be any kind of automated personal data processing that analyses or predicts certain aspects of an individual’s behaviour, socioeconomic situation, movements, preferences, health and so forth.
If profiling occurs then an organisation must tell the person involved that it is happening, its consequences, and then provide them with an opportunity to opt out. That’s in events where there is a legitimate business purpose to the profiling (that doesn’t infringe an individual’s rights) such as when a credit card processor might use personal data to determine someone’s credit limit.
When profiling is taking place, and automated decision making is being done off the back of it, then a business must prevent any discriminatory factors like race, politics, religious beliefs etc. from having an effect. Bias can be a huge issue in many machine learning algorithms (as seen in a system called COMPAS used to assist criminal sentencing that’s biased towards minorities). There’s many underlying reasons behind this, including a machine learning algorithm being built with small biases that the team (or data scientist) behind it don’t realise that they have; which is then increased through the algorithm’s positive feedback loop. Data scientists therefore have a huge task in front of them, as any perceived bias within algorithms is likely to breach GDPR. If you didn’t already know, any breach of GDPR can result in a fine of up to €20 million or 4% of global turnover (whichever is greater).
GDPR and consent
Where there isn’t a legitimate business interest, then a consumer’s consent for their data to be processed and analysed must be obtained. Records of this consent must be kept alongside the data it relates to. Consent needs to be obtained for each and every use of personal data. Therefore, if a business wishes to use data for segmentation, then consent needs to be given for that use, and then if later on, the data is used in clustering, that will need to be explained and consent given as well.
Explaining data science under GDPR
That explanation requirement raises an interesting point in itself. Under GDPR, businesses will no longer be able to hide behind technical and flowery language that confuses consumers. Language will have to be jargon-free and simple enough for the general public to understand. If the data belongs to a child, then the language will need to be targeted to their age-level and also their parents. It poses a challenge for some data scientists. In some circles, many are used to very technical terminology, so this requirement could lead to many struggling to find simpler terms to explain their work. On the plus side, it should also decrease black box AI and the aforementioned biases it could lead to.
Data will decrease for data scientists
Returning to consent, with consumers having to give consent for each and every data use - plus separate consent specifically for marketing use, then the available pool of data for data science is likely to decrease. Firstly, consumers may not be so open to more exploratory data science (or indeed, understand it). Additionally, seeing as consent will need to be refreshed at regular times, some aren’t going to continue to do this through sheer inertia - especially if they gain no perceived benefit from it.
However, there is a faint silver lining for data science research. If the data doesn’t identify an individual, then the data can be used without consent and for research purposes. Essentially, this means that any data scientist who doesn’t want their data to either decrease dramatically or who doesn’t want to keep gaining consent for each and every use of it, needs to build in robust anonymisation into their data science process.
GDPR is coming for every data scientist
GDPR is going to impact data science in a big way, and the degree to which is affects individual data scientists largely depends on the type of work they are doing and for what company or department. Those working in marketing are possibly going to have the toughest time, thanks to constraints around consent. However, GDPR is going to touch nearly every aspect of a business’ operations. There’s many different ins and outs to the Regulation, so it’s worth checking through an all encompassing guide to make sure you’ve got your bases covered.
There is a vast amount of work to be done. With the May deadline fast approaching, businesses which have failed to prepare already are facing a tough timeline. Data scientists have a huge role to play in preparing businesses for GDPR. All data stored will need to be assessed and consent collected if needed, data storage will need auditing, compliance procedures will likely need an overhaul, and data processing operations will need to be picked over. Models which use personal data will need to be identified as well, and then their inner workings will probably have to be explained to consumers in layman’s terms. GDPR is coming for every data scientist. It will become part and parcel of their job role. Therefore, every data scientist needs to be ready for GDPR and to understand their obligations under the law.