The Right to Be Forgotten - Ep. 99
In an online world, your worst moments may live forever. Even if the original source is deleted it doesn’t guarantee that mug shot won’t show up in a background check. Gabe discusses the challenges of the right to be forgotten as they relate to privacy laws.
Episode Table of Contents
- [01:31] A Moral Dilemma About the Right to Be Forgotten
- [08:18] The Technical Side Versus the Law Side
- [12:50] How Do We Apply the Right to Be Forgotten
- [20:05] An Absolute Technical Nightmare
- [26:32] A Classified Secret
- About Our Guest
A Moral Dilemma About the Right to Be Forgotten
Carolyn: Eric, have you ever done something you really regret and hope that never goes public. I never have, ever.
Eric: I don't think there's anybody around, over the age of two, probably 12 months, who hasn't.
Carolyn: Today, we're going to talk about the right to be forgotten with Gabe Gumbs, Chief Innovation Officer of Spirion. Good morning, Gabe.
Gabe: Good morning. This is why I'm advocating for bringing back physical film, this way you can still have the negatives.
Carolyn: There you go.
Eric: A lot harder to steal too.
Carolyn: I'm an NPR junkie. I first heard about the right to be forgotten about a year ago on a Radiolab episode. I’ve thought about that episode so many times. It kind of started a moral dilemma within me. Should we scrub people's information online? Back in the day before we had the internet, if a story got published in the newspaper, people tossed the newspaper. It was super hard to find that story again.
Eric: You have to go to the library. Pull the old copy.
Carolyn: Even then, you may never find it and you have to know where to look.
Gabe: And how to look.
Carolyn: These days, it's very different with the internet.
Eric: There's the adage that anything you put on the internet, you basically have to think. You want everybody in the world to be able to see, because they have the ability to.
Gabe: I believe the technical term is you cannot remove the pee from the pool.
Philosophical Questions About Rewriting History
Carolyn: Bring it around to what we do, cyber security, it makes me think about whose responsibility is that? With privacy laws, whose responsibility is it to get that stuff off the internet? Can it even be done? Then the other philosophical questions about rewriting history. Gabe, I really need you to give some clarity here.
Gabe: First of all, also an NPR junkie and I know the episode of Radiolab of which you speak. It's a great episode. For those that have not heard it, it’s totally worth checking out Radiolab in general. Let's start with what is the, "Right to be forgotten?" I don't think it's actually labeled the right to be forgotten, although it may have a similar name. I haven't read the specific legal language of GDPR in a while.
Gabe: I've processed most of it at this point. I know that it is in article 17 of the regulation. It more or less applies to an individual's right to have their personal data erased under a number of circumstances. One, if the data is no longer necessary for the purpose an organization originally collected or processed it. GDPR talks a lot about purpose. Purpose is very important when tied to data collection and processing.
Gabe: There are a number of other circumstances under which the right to be forgotten can be applied. Like when an organization processed that person's information unlawfully. You mentioned things like rewriting history. In the original impetus for the reason for the right to be forgotten to kind of come around. It was in relation to a court case where an individual had been wrongfully committed or wrongfully accused of a crime. Maybe it was expunged.
A Major Problem
Gabe: I don't remember the exact details. Nonetheless, the circumstances has changed. All of the information on the internet about that person was, at that point now, not right. There are a couple of arguments there. You could argue that by removing all of it, you do change history.
Gabe: You could argue that you leave it all there and put little asterisks. Kind of like we do for all of the baseball players that were juicing. Which one of those is actually rewriting history versus not? History did still happen, so much so that we're still even talking about that court case.
Carolyn: Let's say that it wasn't wrongfully accused. The guy actually did it, but it was 20 years ago. You don't get all the context around it. All you see, if you search online is that he has an arrest, it's a felony, whatever. Those are basically the context that you get.
Eric: I was in New York this weekend with a buddy, who's a nurse. At 21, he was caught peeing in the woods, public indecency. I swear to God, he was telling me this story. College, right?
Eric: Up all night partying. The cops brought him in for that. He went to court eventually and it was supposed to be expunged from his record. He's now an EMT nurse, working and like seven or 10 years later. Somehow it popped up on a routine background check.
Eric: They go to this guy and say, "You're not going to work here anymore. You were accused of indecent exposure." It’s a major problem, he had to go back upstate New York and get it re-expunged from his record. He tells me this story about how they were so sorry, blah, blah.
The Circumstance Where an Individual Has the Right to Be Forgotten
Carolyn: But even if you get it expunged, it's still there.
Carolyn: So you get it expunged, legally, and the records are sealed, but it's on Google. It's in the newspapers, online. You can search it.
Eric: This was 40 years ago, but it's still there. So Gabe, help us.
Gabe: I have so many problems with that story in particular.
Eric: Those were the old days too.
Gabe: It just burns my soul, starting with over prosecution in general, which is not the purpose of this show. Seriously, we're going to call a college kid in for peeing in the bushes. Now this is how it affects his life? But there is another circumstance under which an individual has the right to have that information deleted.
Gabe: It is if that information no longer serves a legitimate interest for the justification of processing that data. You could legally test the question, does that newspaper have a legitimate justification? A legitimate interest that justifies them publishing information about a 21 year old peeing in the bushes.
Gabe: I'm not a lawyer here, but you could argue that in just the general sense of information being free and public knowledge. It was maybe in a public record at the time. That is some type of legitimate interest. But it serves no one any good in knowing that. It literally serves no one any good in knowing he got drunk and peed that night.
Gabe: I certainly could see an argument to be made. There is no legitimate interest that justifies processing that data any further. But the problem you have here now, is who's going to go test that case and pick up that cause?
The Technical Side Versus the Law Side
Gabe: I guess the ACLU could or something similar to that effect.
Eric: Who would care? It's so inconsequential. Does the newspaper have the right to expunge it or to remove it from the paper after the fact also?
Carolyn: But is it even possible to remove it?
Eric: How do you do it?
Gabe: Now you get into the real heat of the matter too, from the technical side of things. This is where I'm a lot more comfortable than the law side of things.
Eric: We're taking you from urination in the woods to your sweet spot, the technical side.
Gabe: Outside of my comfort zone and a little closer in. So how do you get rid of it? You likely will never get rid of every single trace of it once it's on the internet. If the information that was processed and part of that processing puts it out there, you've got a different problem. If the information still exists within the confines of the organization that processes it, you can get rid of it.
Gabe: You still have a bunch of other technical challenges there as well too. What does it actually mean to delete data or for it to be erased? Strictly speaking, you can delete a data item. You can just get rid of the thing, big, hard delete. Override all the ones and the zeros so that it never shows up again.
Gabe: There's several other forms of deletion and/or abilities to forget data that can come into play. You can encrypt the data. Allow only the keys to the data to be accessed by those that have a purpose for it. Because there's some other provisions in that state.
The Ability to Monetize Data
Gabe: You may have to keep some of that information around for some other legal purposes. That might even be tax purposes, if you're processing purchases of individuals, et cetera.
Gabe: Then there's also the ability to monetize that data. That's what everyone's interested in, including that newspaper. They want to monetize that information they had about random drunk college student. Clearly they have nothing better else to monetize.
Eric: It was upstate New York. I don't think there's a whole lot going on back in the '70s.
Gabe: I can think of a lot of things going on in upstate New York in the '70s. Wasn't there like a bunch of people up there partying all weekend long, if I recall.
Eric: That was the '60s, but close.
Gabe: Close enough. Something tells me they didn't leave until the '70s. They got there in the '60s. The example I like to use is the Amazons of the world or any company that you do business with online. A large part of how they monetize that data is selling you goods and services. It's also in using that data to understand how to sell more goods and services to other people.
Gabe: That means analyzing that very specific data about the individual. People like Carolyn, who live within this zip code have these buying habits. All other data points they can access about that individual and people in their circles and all of those things. You can also pseudonymised that data.
Gabe: You can anonymize that data so that you can forget the elements about Carolyn. It allows you to forget about her, while still being able to process that data. That way you can learn how to target ads towards people like her.
Forget the Right to Be Forgotten
Eric: Is it considered good enough, according to GDPR and any regulations that are coming, that you're aware of?
Gabe: GDPR is not the only regulation that had, we'll forget the right to be forgotten. It's not the only one that has had the notion of de-identification around HIPAA. It has been in existence with us, stateside, since '95. Also, it has a sense of de-identification. We've been sharing health data for centuries with other research individuals and companies.
Gabe: For example, right now we're trying to fight a pandemic, a lot of information, health information is shared. On the HIPAA guidance, there is a guideline of what is good enough to be considered de-identified. GDPR doesn't have an explicit notion of what is good enough were HIPAA explicitly states, to the letter, this is what you need to do to consider data de-identified. These elements must no longer exist.
Carolyn: That's part of the right to be forgotten, the HIPAA stuff?
Gabe: No, it's not. I was giving the corollary to Eric's question. It's similar. The right to be forgotten does not have a similar stipulation if these are the elements necessary. One of the reasons for that is, the categories of what is personal data is so broad on the GDPR. It is, and I'm paraphrasing here, any directly or indirectly identifiable data.
Gabe: De-identifying indirectly identifiable data is by definition not really possible, at least strictly speaking. There are still some mathematical ways, which are to my knowledge these days, mostly theoretical. I've seen some practical applications of them, but not a lot that would allow you to do so.
How Do We Apply the Right to Be Forgotten
Gabe: What you're getting into is very much the problem of how do we apply the right to be forgotten. Given all these technical challenges and these business impediments to being able to use that data to further grow. To be honest, you're not going to stop the Amazons of the world from wanting to monetize this data. There will be a push and a fight to figure that out on their part.
Carolyn: Are there any laws in the US, like GDPR, is anybody trying to do this here?
Gabe: CCPA, the California Consumer Protection Act, is the closest we have to that. It is, in a lot of ways, very similar. It’s drafted with similar language and a similar thought in mind. To protect the consumer from having their data being monetized and shared beyond their control. There are a few challenges there. The first is, it doesn't go nearly as broad as GDPR.
Gabe: Second, it only applies to California and companies that do business with California citizens. It actually does increase scope, because that's a lot of people. A lot of people do business within California. If it's separated out onto its own, their GDP would be like in the top 20th percentile of countries on the planet.
Gabe: So a lot of people do business there. In order for that to apply to you, you'd have to have a business that generates more than 20 or $25 million, do business in California with California residents, for CCPA to apply.
Eric: So there are limits at least.
Gabe: There's lots of limits. It's only California, so there's no federal level of regulations.
Eric: Right, which would make it a nightmare.
Ensure Data Integrity
Carolyn: That responsibility falls on the company that collects the data.
Gabe: It does and there's sub processors. If you collect and process information on behalf of someone else, then that also falls to you.
Eric: We see this with simple data residency requirements, in our business. In certain cases, Canada or some countries in Europe, South America seems like they're getting pretty big on this. They want data residency. They want the data to reside in the country.
Eric: That usually results in us having to set up a data center or a point of presence there. Processes and procedures, staffing many times. Not only do they want data residency, they want Canadian citizens or whatever. It becomes an operational, a massive cost, quite frankly.
Carolyn: So then that falls on the cyber security vendor?
Eric: Not just cyber. Anybody doing business in that location needs to essentially set up a shop and ensure what I'll call data integrity. Gabe, you'll probably balk at that. But the integrity of the data staying in that country, being touched only by national residents of that country.
Eric: So you're essentially replicating. Instead of being able to consolidate and control cost around operations, ease of use and everything else. You're standing up multiple points of presence that costs millions of dollars a piece over the course of a year. At a minimum, just to get started.
Gabe: What you see as a nightmare to implement has also been an absolute gold rush, unfortunately, for this industry. There are so many technologies trying to come to the rescue of us. Some of those things are causing more confusion than not.
The Apple and Google World
Eric: As I watch the Apple, Google world I'll call it. Apple and Google represent different types of companies. Different in the way they think about data in my mind. Apple, if you read their privacy policies, appears to obfuscate a lot more information.
Eric: They don't collect as much, in the cloud or back at corporate. Identifying information where other companies may want to collect every single morsel they can to make better business decisions. You see it with Siri and then you see it with what's Google's assistant?
Carolyn: Does it matter?
Eric: It does. It's much more accurate in predicting what I'm going to do, where I'm going to next, that type of thing. Siri's kind of challenged at times, because it doesn't have as much data. It doesn't collect as much about me, I should say, or so I believe. Gabe, you're not in agreement.
Gabe: No, I'm not. If you carry one of these things around, there's nothing it doesn't know about you. There's even a gyroscopic device in here. It can tell when you're doing different types of activities.
Gabe: This thing knows if I'm sleeping. If I'm climbing a mountain or if I'm rollerblading versus riding a bicycle. That is how information this device is capable of collecting and processing on me.
Eric: Those companies want every single morsel of it they can get.
Gabe: Absolutely. Again, I can monetize every single bit of that. I know that people like Gabe like to rollerblade, I know that because of the gyroscopic activity on his phone. That allows me to sell him more goods and services related to that kind of thing.
Signing Away The Right To Be Forgotten
Gabe: It allows me to do all kinds of stuff. I might co-located more sporting equipment to people like that in close locations to their distribution centers. So when you order something that I know you're likely to buy, boom, it's there before noon today.
Carolyn: We like that. I voluntarily wear an Apple Watch.
Gabe: There you go.
Carolyn: I like the convenience of it.
Eric: You like it until you don't.
Carolyn: Until I pee in the woods.
Gabe: There you go.
Eric: It's a placeholder for all the things that could happen to you that you have nothing to do with.
Gabe: There you go. What you just highlighted and also one of the very important provisions of GDPR, is consent. You gave them the consent to process that information. The very nature of signing up for that service and wearing that watch, you've consented that they have data collection.
Eric: Most people consent, because they just don't want to read the whole privacy document. They don't understand what it is. I know many educated people, first of all, they don't even take the time to understand it. If they did, there would be no way to figure out what's in scope and out of scope.
Carolyn: As soon as I accept, then does my right to be forgotten, I just signed that away.
Gabe: That's where GDPR and CCPA come back into play. They preserve those rights for you. Basically, those regulations are aiming to define your ownership of your data and give it back to the data subject. That's exactly why they're coming into existence, because you cannot sign those things away.
An Absolute Technical Nightmare
Gabe: There is a legal argument currently in play that you have an unalienable right to that information. It is yours and you cannot give it to anyone else. You cannot sign the rights to that away.
Eric: It seems like a technical nightmare.
Gabe: It is an absolute technical nightmare. In order to implement, you have a number of additional challenges. How do you locate and track all of that information? Now that you've collected all this information, it makes its way around inside your businesses in any number of ways. How do you keep track of every single time Carolyn's information gets replicated?
Gabe: In a report that gets generated and sent around to 20, 30, 40 people internally. It replicates throughout all the many systems. To your point, it also gets co-located into a number of other technical systems. It's not like this information just sitting in one machine. You can just go, "Ah, there's all of Carolyn's machine. I'll just hit the big red delete. " No, it's everywhere. Now I have to go find it, and that in and of itself is a challenge.
Carolyn: Gabe, as we're wrapping up here, honestly, I don't feel like I got a lot of clarity. Help me out, like what do we do?
Eric: What's the answer? We just checked off the privacy button. Tell us what the answer is so we don't have to process it.
Gabe: Stay calm and don't pee in the woods. That's what you get. The answer is in where I just left off. It’s the only way you'll be able to apply any of the other controls. Around being able to both stay within regulatory guidelines and the ability to delete it.
Knowing Where The Information Is
Gabe: But also process it legally by first understanding and knowing where all that information is inside your system. That is step one. If you don't know that, you literally cannot do any of the other things necessary.
Eric: But that's so hard. I worked with an agency a few years ago that was doing smart data tagging or trying to. I don't want to get into specifics, but they had a certain number of kilobits of a data tag. A header to every piece of data, every description you could imagine. Probably missing another couple hundred thousand pieces that they'd have liked to have had. The implementation was impossible.
Eric: Every piece of data had this little header with it, that said, "This is what it is. It’s how it's been used," on and on and on. But you couldn't implement it. How do you go to a commercial industry and say, "Hey, we're doing smart data tagging. I need you to take this header and apply it to all of the data you put in your systems." It was a brilliant idea by a bunch of brilliant scientists, that you would never be able to implement.
Gabe: We've got a handful of brilliant scientists that work for us also. A couple of them happened to reside right here at Spirion. It is a large part of what we do, it is in that discovery of sensitive data. The thing is, you don't need to find everything within the environment. You just need to find all of the sensitive data within the environment.
Gabe: You need to find all of the sensitive and subject data in the environment. That is at the core of what we do.
The First Challenge
Gabe: Understanding, locating, classifying, and tagging information. That way, we’ll know where it's at, track it throughout its life cycle, and manage it as well. We take the approach that in order for you to be compliant, you first have to understand that information. In order for you to understand and control that information, you have to be able to find it as well. We work our way backwards from that problem.
Eric: I'll give you my number in a bit. How many companies in your experience understand their data?
Gabe: The answer is none.
Eric: You and I are fully in unison there. That's the first challenge.
Gabe: The answer is definitely none. A large part of understanding their data comes from understanding their business at the core even if you look at our customers. I would argue that many of them understand the challenge and the problem and so they're working towards it. But understanding the totality of all of your data, is not always a thing.
Gabe: Let me put it this way. I don't know that that should be the goal. This kind of goes back to some of the old problems we've had in security in general. It’s just wanting to apply controls to all the things.
Eric: To everything, equally.
Gabe: You can't do that, you shouldn't try to do that. So we break that problem down into, "We're not going to try to understand every single thing. Let's understand the things that matter and let's start there and protect those things." Trying to understand everything is a fool's errand.
Eric: How many people understand the different types of data they have, and the different risk levels to that data? What's important and what's not, in your experience?
Increased Maturity Throughout Organizations
Gabe: It's a non-zero number.
Eric: Which is progress. We're good, we're doing better.
Gabe: I'll say this, five years ago, that answer was a lot closer to zero. More organizations today understand that than ever. Now, we have things like GDPR and CCPA pushing them in that direction, you'll continue to see that maturity increase throughout organizations.
Eric: In the government that I deal with, you typically hear it in the civilian space as high value assets. You'll see the DOD focusing on components of weapons systems or classified programs. They understand that there is a level of importance that may be higher than other data.
Carolyn: Do you think the government does it better, Eric?
Eric: No. This is not my area of expertise. Gabe, I'll defer to you. My gut says the financials have it down pretty well. Healthcare, because of HIPAA. Clearly, there is data that is within scope of HIPAA and out of scope.
Eric: Anything within, is going to be classified and protected very differently. The government is just starting to get its hands around it. They don't even know what they have and where they have it, to Gabe's point.
Gabe: You're heading in the right direction. You've highlighted the industries that have been heavily regulated have already been thinking about this problem for a while.
Eric: Because they had to.
Gabe: The government being somewhat self regulated, unless you're counting.
Eric: Or unregulated.
Gabe: One of the bigger challenges they have is their notions of classification breakdown. Along the lines of information, not just data. There's a bit of nuance in there. It's why they have classifications like unclassified controlled information.
A Classified Secret
Gabe: It needs to be controlled even though it doesn't have any classification. Their classifications are different because their threat assessments are different than those related to Amazon or Apple.
Eric: Even there, if we were recording this podcast on a secret level network, it would be classified as secret by default, because it's sitting on that network. Whether this podcast has any value whatsoever, really doesn't matter in that case.
Eric: Even if I want to put it on an un-class network, I have to go through massively difficult, onerous process to get it declassified. Somebody's got to listen to the whole thing. They got to understand the context.
Eric: You've got to be able to justify it. It takes months. They don't necessarily understand this podcast as a low risk value asset. It's a classified secret because it's on that network. Even in that case, it's challenging.
Gabe: You couldn’t apply that level of rigor to a technology enabled business and have them profit.
Eric: To move this podcast from secret to un-class, a human would have to listen to it. There's no automation that's going to make that happen.
Gabe: Hopefully humans will listen to this thing.
Eric: I'm saying from a classification purpose. Think about if you had to take the podcast away, just transactional records. You have somebody who had to step through hundreds of thousands, millions, billions of records.
Eric: In the case of an Amazon or a Walmart, it's got to be an automated, easy process. There’s a need for some level of regulation, because it does drive better data privacy considerations.
Gabe: I'm not a big fan of more regulations. Unfortunately, left to their own devices, I don't think there's any impetus industry to protect our information.
There’s No Getting It Back
Gabe: More importantly, there's far more impetus to abuse it, if not classified, because it’s worth quite a bit. The ability to understand everything about a human being and their behavior. I only need to point you to Cambridge Analytica to understand the depth of the problem. It would get abused versus not, if left to their own devices.
Eric: Back to the beginning of the podcast, if you pee on the internet or in the internet pool, no one hears it.
Eric: Or everybody has the ability to see it.
Gabe: It’s out there. You cannot remove it. There's no getting it back, not even a little bit. Archive.org is a testament to that, the way back machine.
Carolyn: The idea of more regulations just kind of makes my stomach hurt.
Eric: But it's moderation. Regulations are there for a reason. You can abuse them. But look at HIPAA, and nobody would say HIPAA's perfect. In this case, it's probably positioned a little better to protect data, protect privacy.
Gabe: It is, to your point, HIPPA not only being perfect. The P in HIPAA stands for Portability Act. We’d all argue that we've not achieved the portability of our own health information whatsoever. It probably hasn't stood to that test. But it has the byproduct of helping secure our personal information, our health information and stuff.
Carolyn: I don't know if I feel any more clarity than I did at the beginning.
Eric: I can give you clarity. It's Siri, Cortana for Microsoft, and Google Assistant.
Carolyn: That's a little bit. Gabe, thank you very much for joining us.
Gabe: Thanks for having me on.
About Our Guest
Gabe Gumbs has a deep-rooted passion for technology, information security, and problem-solving. He’s the Chief Innovation Officer of Spirion—a leader in rapid identification and protection of sensitive data. He is channeling that passion to make the digital world a safer place. Wielding a unique mix of technical vision, marketing, and business acumen, Gabe is
shaping the future of data security.
He’s protecting the sensitive personal data of customers, colleagues, and communities around the world. Despite having held a range of leadership positions in security technology including VP of Product Strategy at STEALTHbits and Director of Research & Products at WhiteHat Security, Gabe considers his most valuable experience the time he spent on the ground as a security practitioner.
Thanks to his intimate understanding of the real issues security professionals face on the front lines. He’s able to identify the core of the problem and create innovative solutions that push data security technology forward.
With a firm command of the increasingly esoteric technical details of his field. As well as the rare ability to communicate them in a clear and meaningful manner, he’s become an industry thought leader.
Gabe has shared his perspective and expertise across many outlets and channels, including InfoSecurity Europe, ILTACON, and CSX. Now he’s spearheading Spirion’s vision for data privacy in the next decade and beyond. He’s leading the way to a more secure and private tomorrow for us all.
Gabe Also hosts the podcast Privacy Please https://podcasts.apple.com/us/podcast/privacy-please/id1501600433https://twitter.com/PrivacyPlsPod
Find Gabe on Twitter