Next week the cybersecurity world will descend on Las Vegas for Black Hat USA 2019. I thought it was a good opportunity to ask Raffael Marty, Vice President of Research & Intelligence and Head of Forcepoint X-Labs, what he has learnt since his #BHUSA 2018 briefing “AI & ML in Cyber Security - Why Algorithms are Dangerous”.
q) What was the takeaway from your 2018 briefing?
Raffael: I explored the topics of artificial intelligence (AI) and machine learning (ML) to show how mistakes can be made in applying those technologies to the cybersecurity problem space. Vendors and practitioners make mistakes – they still do. My goal is to help people understand what can go wrong and how to avoid making those same mistakes.
I can encapsulate my advice into 3 takeaways:
- Algorithms are getting ‘smarter’, but expert knowledge is more important when applying ‘analytics’ (AI) to cybersecurity problems.
- When using AI, we need to intimately understand our data, our algorithms, and our data science process
- History is not a predictor of the future – but knowledge can be – we can’t rely solely on mining historic data to draw conclusions; we need to incorporate expert knowledge.
My 2018 slides are available for download.
q) What advice would you give 1 year on?
Raffael: I am not the only voice saying to be careful when applying algorithms (AI) to solve cybersecurity problems, and this voice has raised in volume over the last year. AI/ML presents such an exciting opportunity for cybersecurity but it is critical that we exercise caution in its application.
I still see practitioners start from the technology, the algorithms, to solve a use case, rather than let the use case determine what the right solution to the problem is. Here at Forcepoint we use machine learning to support, enhance, and replace other methods – when it’s appropriate to do so.
Another issue I see is the explainability problem as it relates to supervised machine learning. This makes it hard to understand what an algorithm learned. It is not possible to aks questions such as “How did my self-driving car arrive at that decision?” or “Why did the system determine that file or behaviour was malicious?” from the algorithms. Research into causality is building momentum and I will be speaking more about it in the future.
q) You mentioned a quick quiz that can be used to self-assess if a company is ready to apply AI/ML?
Raffael: I can set a challenge yes. I can think of several questions that can be used to determine whether you or your company are ready and ‘qualified’ to use data science or ‘advanced’ algorithms like machine learning or clustering to find anomalies in your data. Here are five:
- Do you know what the difference is between supervised and unsupervised machine learning?
- Can you describe what a distance function is?
- In data science we often look at two types of data: categorical and numerical. What are port numbers? What are user names? And what are IP sequence numbers?
- How do you go about selecting a clustering algorithm?
- Name three data cleanliness problems that you need to account for before running any algorithms?
q) What’s next?
Raffael: Before applying AI/ML/analytics to your cybersecurity it’s necessary to know how to prepare the data sets, how to choose the right algorithms, and how to interpret the results from the algorithms. Consider if you have permitted the use case to influence the appropriate solution – which may or may not be an AI/ML approach. I have been a strong advocate of that in the past and will continue to be.
Forcepoint X-Labs will be at Black Hat USA 2019 next week. We hope to see you there.
Raffael will further the AI in cybersecurity discussion at the Cyber Symposium conference in Colorado, USA (September 19-20).