Do security analysts trust Machine Learning powered analytics?
Do security analysts trust analytics that are powered by Machine Learning (ML)? In my opinion, it seems as if the vast majority do not.
Given this skepticism, the obvious next question is “why not?”
To answer that, we need to step back quite a bit. To really come up with a good answer, I believe it’s worth taking a moment to understand some fundamentals around analytics and Machine Learning.
First, let’s define some terminology so we’re all on the same page:
Analytics offers a way to convert data into information for effective and efficient decision-making. Analytics identifies interesting patterns and insight about the data, such as helping to understand the customer’s behavior in order to predict buying habits. In the cybersecurity world, analytics plays a major role in identifying risky users or insider threats by focusing on user behavior.
ML algorithms help computer systems to learn and make predictions like humans. The ability to learn and predict is achieved through data, expert knowledge and interaction with the real world.
ML algorithms can analyze big data, convert data into information, predict future events, and uncover mysteries hidden within data. By doing this, they offer us the ability to save lives, predict those at risk of heart disease and strokes, locate possible crime and crime patterns, avoid data breaches, predict cyber-attacks, identify insider threats and stop hackers. ML algorithms can be tasked with learning human behavior to predict and prevent malicious or accidental insiders, or to detect and protect enterprises from malware as yet undetected by security software.
The question of trust
If ML has this potential, then why don’t we trust its findings? Based on my research, I believe that the following issues are to blame:
- The explanation of why we should trust the algorithms is missing from the process
- There is a lack of good training data
- The layer of expert knowledge is missing
- There is a lack of regulations and established social norms, and
- Bad news sells! There seems to be a perverse joy in reporting on how self-driving cars can crash, automatic photo recognition can make racist mistakes and neural networks are programmable to crack passwords, with much less emphasis placed on all of the benefits available to us from ML.
Bridging the gap
Even in an imagined future world where ML has achieved a near-perfect state, it’s safe to assume that analysts, being human, will still have reservations about ML’s output. I propose there are six areas that we should be looking at to address this:
1. Human-centric AI: According to this article, if the user is given even slight “control” over algorithms they will use ML powered products/tools. The ability to:
- control and modify the outcome
- avoid false positives, and
- shield users from effects of false alarms
all provide the confidence that we are in charge and have the power to avoid false alarms. And, importantly, algorithms are helping humans in decision-making – they’re not here to replace us.
2. Context-based: Distrust can be caused when “why” and “what” reasoning is missing around questions such as:
- Why have the algorithms labelled me as a risky user?
- What does that output mean?
- What is the reasoning behind the findings?
ML algorithms are all about analyzing the data and detecting an interesting pattern. Assuming the algorithms are working correctly is not enough; to make these findings trustworthy to the analyst we need to tell a better story and show a relevant output. In my opinion, it’s not only beneficial for analysts but developers as well, as it helps them recognize false-positives.
3. Case-based: When we make a decision we usually rely on our past experiences, like when deciding on which restaurant to go to. As humans, we can’t remember every scenario, but for machines it’s easy. In case of alerts, if the analyst is provided with similar past cases by the ML-powered analytics tool then arriving to a decision for the analyst will become easy, effective and efficient. I believe this is one of the keys that can aid in gaining analyst trust of ML-powered analytics.
Use-cases are also beneficial in training analysts, technical writers, and product sales teams.
4. Investigation-based: 62 percent of security incidents are human error; is it okay to blame only algorithms? We must investigate false-negative cases to identify whether it is
- the algorithms that failed to discover the threat, or
- users who failed to notice a red flag.
In either case, we are profiting: if it is the algorithm that is incorrect we have the opportunity to improve our algorithm, in the other instance we have the opportunity to improve our visualization and alert mechanism.
5. Model-based: Understand and explain how algorithms and models work. We should be working towards understanding how the algorithms work, and what features are extracted by the algorithms, so that we can take full advantage of the algorithms.
6. Ethics-based: Why is it easy to depend on human judgement? Because humans learn from their mistakes, are bound by rules and regulations, and consider social norms; none of this is necessarily true for algorithms. Algorithms are powerful; they possess a power to change and rule the world. Thus, before algorithms outsmart humans, we have to put some regulatory frameworks in place to secure our future.
In my opinion, this is one of the most important facets for bridging the gap and requires dedicated discussion of its own. Here, we merely point out that to bridge the gap between ML powered analytics and its users, we should put significant effort on ethics for ML algorithms.
Can we build trust between analytics and its users by adding context, case-studies, investigation, educating about models, and following social norms? The answer to this is a resounding “Yes!” The combination of “the explanation,” power to control the final outcome, investigation of false-negative and ML bound by social norms, I believe, is a way to bridge the gap between ML-powered analytics and its users.
One last thing: let’s assume we have incorporated everything mentioned above and we have great analytics with all the bells and whistles, but trust is still missing. Then what? Should we let the convenience offered by the analytics speak for itself? In my opinion, yes - open the door for convenience. Give the user an option, focus on making ML analytics effective, efficient and easy to use, and I bet many, if not most, will eventually opt for ML-powered analytics.