Is government cybersecurity ready to trust Machine Learning powered analytics? - Ep. 18
Machine learning has become more mainstream in cybersecurity as a way to make inroads against cyber threats. In this week’s episode Dr. Kular, a Research Scientist in Forcepoint’s innovation Labs joins the podcast to share her thoughts on whether security analyst trust analytics that are powered by machine learning.
Introduction & meet our guest Dr. Kular
Arika: Hi everyone welcome back to Episode 18 of the To the Point Cyber Security Podcast I'm your host Erica Pierce and joined every week by Eric Trexler. How are you Eric?
Eric: Doing well Arica, it's cold outside.
Arika: It is, it's very cold, it's beyond cold right now. Well we'll have a good episode here to warm everybody up. This week we have a guest, who is definitely interesting in terms of her background we have Dr. Kular who is a research scientist in the Forcepoint innovation lab and who holds a PHD in computer vision from the Florida Institute of Technology. Welcome Dr. Kular to the podcast.
Dr. Kular: Thank you so much Arica.
Arika: So, Dr. Kular I know we're going to talk this week about machine learning and how that intersects with trust and trust has been an area actually we've spent a great deal of time talking about in previous podcast episodes. First, just out of curiosity can you tell our listeners as well as myself, Eric you may already know this but:
What is computer vision and how does that intersect with cyber security, or how did you get into that space?
Eric: I've been looking at computers since the early '80s, so I've got computer vision.
Dr. Kular: That's one way to look at it. So computer vision is very broad area or field of research. I was more concentrated towards images and video so recognizing what's happening in a video.
Let's take an example of traffic and four way intersections, my PhD was based on identifying they patterns which are occurring in that particular area. That once patterns was recognized and other different enriched sequence that patterns are taking place. Basically defines how the traffic is flowing at a traffic intersection. This is helpful if you want to identify accidents or somebody's running the red lights and all this stuff, so that's what computer vision for me at that time looked was.
Machine Learning and AI
Arika: So really as we look at machine learning a lot of this is about patterning and when we take it to human behavior within computers or within an enterprise you're really taking your PhD research from the vision side and looking for patterns and when do people...or tasks or processes step outside of norms, pattern norms.
Dr. Kular's PhD in visual pattern recognition
Dr. Kular: Yes, and multiple other stuff. So my PhD was identifying the patterns, but it was in the form of motion. So what I did instead of considering the velocity over the speed, I started considering words so that I could find patterns in the text and we can use that for classification purposes.
So for example words like classification, so if we have categories like shopping, we have categories like business, then entertainment, so on and so forth. Each and every category has different set of words which are more prominent in that. I use my method to identify pattern for each and every category. So in the future with a new website it can easily map which category it matches the most.
Arika: Let's talk a little bit more about the machine learning aspect of it because that's been something that we're hearing a lot about especially in the areas of cybersecurity and I know that government in particular as well as other organizations, they are looking at how machine learning as well as AI, artificial intelligence can be leveraged. What I think is interesting Dr. Kular, you've written about
How is machine learning trusted in terms of how security analysts trust it should government trust it. What are your thoughts there?
I mean you have a great blog that we'll link in our show notes that you wrote about can security analysts trust machine learning powered by analytics. What is your answer? And I'm curious to hear your answer as well Eric.
Trusting risk scores
Dr. Kular: When I was working through this UEBA product, so every time I heard risk scores and everybody was like "can I trust those risk scores?" Was the problem with the risk score or the technology behind that? So mostly nowadays machine learning algorithms, like anomaly detection, proximity, stuff like this or classification algorithm.
They are used to come up with some sort of a risk score or risk level for user. Based on what sort of activities that they are performing. If we cannot trust a machine learning algorithms, so whatever we produce it's not going to be trusted because we don't know much about machine learning. First of all we have to build a trust in machine learning algorithms. They are doing good job they are being successful at so many areas, you know?
But there are so many other cases where you see machine learning has screwed up so many things like when you do some sort of a google search on gorillas and they were showing people with a darker skin tone. Then a self driving car ran through a red light and a pedestrian was killed so all this kind of articles or the facts create negative publicity.
Lack of knowledge is a barrier to adoption
Dr. Kular: And not enough knowledge about machine learning algorithm, how they work that creates another barrier of trusting the machine learning algorithm so we have to overcome that barrier. So my thought was how can we overcome those barriers because machine learning is really helpful in the long run. So I suggested if we provide humans some sort of control over the output, that might change the prospect towards risk scores from machine learning might change because there was an article I read about algorithm aversion, they said if the algorithm was performing 85- 90% and there are still some error, but human has control over the final output, user will use the algorithm to perform the task.
Dr. Kular: Along with that the control to human if we provide context with each and every output, so let's say if a user is scored at a 90% chance that this user is a [inaudible 00:07:26] so with that 90 score we provide context like user is downloading lots of data, user is downloading data at unusual hours, user is logging in from remote location. So everything unusual is listed with the output will give more context and it will help to say, okay this score makes sense. Along with that we can add investigation site to it, we can add more privacy consideration to it. That will help to build a trust between the scores, the levels produced by machine learning algorithms about activity of our user. That's what I think.
Machine learning as a additive capability
Eric: Arica, I've always looked at machine learning as a additive capability to security teams. So analysts can only do so much as humans and only so fast. There are certain activities that are just machine tailored for machine learning. Artificial intelligence, the algorithms, call it what you will. And then there are others that aren't.
You do still require human intervention to say I trust this algorithm, right? Because adversaries can trick machine learning now, they can trick artificial intelligence.
My old CTO used to talk about weather and machine learning and he would speak about we've got some pretty good models that can predict hurricanes, where they will go, the track of a hurricane, where they will land with a pretty high degree of certainty. Why? Because the models been tuned and it works pretty well. We are still horrible at predicting when and where an earthquake will hit though. The models just aren't there. I think the same applies in cybersecurity. For certain models, for certain activities, machine learning can be extremely beneficial.
An analyst can't look at that 100,000 events in an hour, but a machine can and it can get down to 500 of interest or whatever the model allows.
There are also certain activities where you just have to have a human, where you can't predict it. I think knowing the limitation of the modeling of the algorithms is probably one of the key components. Dr. Kular, I don't know if you agree or disagree with that.
Data quality plays an important role in machine learning
Dr. Kular: Yeah, that is important. Along with that the data that we use to train our models on that plays a huge role too. So if our data is garbage then no matter what algorithms, or whose you apply on it, it's never going to produce a good result. So we have to start with a good foundation, and that good foundation is data. So once you understand the data then select which algorithm works best for that kind of data.
Arika: That makes a lot of sense, I mean I think especially in areas where there's so many sensitivities and threats are still high having still that level of human intervention is still very, very important. I think it's the complementary the, it's the emergent of the two and I think that's where a lot of organizations are going. Especially in terms of how they're adding more innovation to cybersecurity, but also wanting to make sure they still have the right safeguards and controls in place.
Dr. Kular: Definitely. Especially when you are trying to program or learn from human to control them, you know why are you doing something really stupid and humans they are so unpredictable, if you don't put humans in the loop then why using machine algorithms? It's not going to give you best outputs.
Arika: It's also job security right? For the analyst.
Dr. Kular: Definitely.
Dr. Kular on the future
Eric: We have plenty of jobs and not enough analysts, we don't have a problem there. Dr. Kular, it's 2019 now. 2025, 2029, we'll get into the future here, crystal ball.
Where do you see this world going? Machine learning, artificial intelligence, what's the art of the possible?
Dr. Kular: So just few days back I read an article I think it was published in MIT Tech review or something like that. So there they did some sort of a research. So they downloaded 16,000 artificial intelligence, machine learning related research papers. So they mention in the '50s, '60s there was more of a neural network used, then something in '70s then comes in '80 knowledge based systems were used then in '90 it was Bayesian network, then in 20 it was support vector machine, and then in 2010 again the neural networks came back.
Back to neural networks
Eric: So we're back to neural networks like 20 plus years later?
Dr. Kular: Exactly and now we are getting from past few years we are more interested in reinforcement learning. So it just says we might get back to something old or something will click and we might come up with something new.
Eric: For our listeners, and I know Arica knows exactly what you're talking about, as do I. Define what we mean by neural networks please.
Dr. Kular: So neural network, it's very simple right? Neural network is in a very basic term is like you're feeding something into a box and that box automatically decides what features are good and bad and they use that feature and give you some sort of output. That output can be a classification or recognition stats. So that's the basic thing about it, everything is magic.
Eric: Why did we leave that? And why are we back?
Dr. Kular: That's a good question. I do need to read more about it, but it was fascinating.. we have the neural network, but it wasn't good enough at that time so we came back to neural network which gave us very good results with the text, with the speech and everything. Now we are getting into reinforcement, basically like how we human work. If the output is not right, we right away say this is not good. If the output is good then we say go ahead. Now we are doing enforcement learning, and also active learning.
The continued role of humans
Eric: Okay. In the article you wrote you talk about distrust can be caused when why and what reasoning is missing around questions such as; "why have algorithms labeled me as a risky user?" "What does the output mean?" "What is the reasoning behind the findings?"
Does that mean there's a role for humans in the foreseeable future? The machines aren't going to take over?
We need some level of logic, human logic, and put it into the calculations.
Dr. Kular: Human is really right from the training to tuning to testing. Everywhere you need human. So humans, they should not go away. If you are just saying we don't need human for these algorithms, the output is like 100%. I don't believe it, so we definitely need human to just make our system better and better everyday. We are providing machines with whatever the limited knowledge we have, right? But activities that we perform, human perform, the way we speak it's so dynamic it keeps on changing. Machines can perform whatever you teach it, or whatever the data you have fed into the machine. That's it, it does not have it's own logic. So you need a human for that logic.
Dr. Kular: Unless artificial intelligence become reality.
When will machine learning or AI intervention become prevalent?
Arika: Well and that was actually my question. Do you think taking an industry or area like cybersecurity, setting that aside, but in other places do you see in the future us getting to a place where it is 100% machine learning or AI intervention? I always use the reference to the show Black Mirror, because they have such great examples of what the future may look like in terms of AI and machine learning. So I oftentimes wonder how far we are away from where the machines are making so many of our everyday decisions and with very limited human intervention.
Dr. Kular: This is what I can say for certain, it's not going to go anywhere. We are so dependent on machine learning, artificial intelligence, all the technologies, when we are typing a text it suggests what the next word can be and sometimes we just rely on it. So it's not going to go anywhere, but yes it will get better and better, so maybe by 2050 we will have something better.
Arika: Time will tell.
Eric: Well Arica, I almost equate it to robotics right? If you look at manufacturing line or something. We still have a huge role for humans with designing them, programming, coding them, operating them, fixing and repairing them. They do remove some of that manual tasking, manual labor from the process. So as we look at our typical customer, government or commercial, it really shouldn't matter there're a lot of really mundane tasks that a cyber analyst has to go through in pursuit of the bad right? In pursuit of what's happening in the business. Anything we can automate is a good thing because there's just way, way too much work.
Dr. Kular: Exactly. The amount of data we are producing nowadays in the forms of documents, in the form of spreadsheet, is just enormous. If we can analyze and find pattern in those document, in those spreadsheet, it can give us an insight into so many things.
The Target pregnancy prediction example
For example Target, so I don't know whether you guys know this example or not. So target a few years back they started learning a customers habit. When they were learning the customer habit they realized pregnant women in their second trimester they attempt to buy certain lotions with certain scents in it. After learning it and after seeing over and over again that's true they started sending coupons to pregnant ladies who were in their second trimester or who are pregnant. So they sent it to one family and to basically a teenager and the father of that teenager went to target and he was yelling at the manager like what exactly you are trying to say, you are encouraging my daughter to get pregnant?
Arika: Oh my goodness.
Dr. Kular: So, after few days the father of the teenager he called the manager and said sorry, he said she was pregnant they were hiding that fact from him. There is...
Arika: Wow, all from the lotion.
Dr. Kular: Yeah, exactly.
Wrapping up this week's episode
Eric: What you're saying though is the models can really work and even if, in this case, if it hadn't worked it works most of the time and once in awhile we need human intervention to say why.
Dr. Kular: Exactly. Especially when it comes to assigning a risk to somebody. At that time you need to say why because if I take away your access to your computer I lock you out and you're like why are you blocking me. You need the answer to that why to justify, or to justify why you are being locked. I think that why is very important.
Eric: I think Dr. Kular is saying check your target coupons more carefully from here on out though.
Arika: Yes, absolutely. You can learn a lot apparently. Well, thank you so much Dr. Kular, this was insightful on a number of areas. I think we've learned quite a bit. Really appreciate all of your expertise in this area and I think it will be interesting to see where it goes. Both in the government as well as other organizations in terms of how they leverage and AI and machine learning with human intervention going forward. So thank you so much for being on the episode today.
Dr. Kular: Thank you, thank you so much for having me.
Eric: Until next week Arica.
Arika: Yes, until next week thanks so much everyone we appreciate you listening and please do continue to give us ratings in the iTunes store and let us know if there's topics that you'd like us to cover. Until next week thank you for listening to To the Point.