Future Insights – Inherent Bias in Machine Learning

October 20, 2020 |

0 min read

Artificial Intelligence

A note from our series editor, Global CTO Nicolas Fischbach:

Welcome to the second post in our Forcepoint Future Insights series, which will offer six separate points of view on the trends and events we believe the cybersecurity industry will need to deal with in 2021. Check out the first post in the series: The Emergence of the Zoom of Cybersecurity.

Update: The Future Insights 2021 eBook is now available for download for those of you who want to dig into all six insights in one place.

Here's the next post from Raffael Marty, Vice President Research and Intelligence:

Cracks in Trust and How to Mend Them

Looking at the cybersecurity landscape today, I have to say I’m glad I’m not a CISO. In an ever-evolving world of digital transformation, omni-connected devices and semi-permanent remote workforces, keeping critical data and people safe is a huge challenge. So huge, in fact, that it can’t be done without the implementation of machine learning and automation.

At the core of understanding risk and exposure to an organization, we need to understand its critical data and how that data moves. We can only do so by collecting large quantities of metadata and telemetry about said data and the interactions with it to then apply analytics to make sense and translate it into a risk-based view.

However, developing automated systems is not without its challenges. In 2021, I believe machine learning and analytics will fall under tighter scrutiny, as both our trust in their unbiased nature and fairness, and their ethical boundaries will continue to be questioned.

Rage at the Machine

We saw headline-grabbing incidents this summer. For example in the United Kingdom, where the government initially decided to let algorithms determine schoolchildren’s exam results. However, the bias which had been baked into this particular algorithm resulted in significant drops in grades: unfairly skewed to lower-income areas, and worse, not taking the teachers’ expertise into account. This resulted in an embarrassing U-turn, where people ended up trumping machines in grading exams.

This is not the first time that algorithms and machine learning systems, trained on biased data sets have been criticized. You will have heard of Microsoft’s Tay chatbot and you may have heard of facial recognition software incorrectly identifying members of the public as criminals. Getting it wrong can have life-changing effects (e.g. for the students or people applying for credit) or could be as “minor” as an inappropriate shopping coupon being sent to a customer.

A number of cybersecurity systems use machine learning to make decisions about whether an action is appropriate (of low risk) for a given user or system. These machine learning systems must be trained on large enough quantities of data and they have to be carefully assessed for bias and accuracy. Get it wrong, apply the controls wrong, and you will experience situations such as a business critical document being incorrectly stopped mid-transit, a sales leader unable to share proposals with a prospect, or other blocks to effective and efficient work. Conversely if the controls are too loose, data can leak out of an organization, causing damaging and costly data breaches.

Finding the Balance in 2021

To build cyber systems that help identify risky users and prevent damaging actions, the data we analyze comes for the most part from monitoring a user’s activities. It’s worth saying upfront that user activity monitoring must be done appropriately, and with people’s privacy and the appropriate ethical guidelines in place.

In order to create a virtual picture of users, we can track log on and log off actions. We monitor which files people open, modify, and share. Data is pulled from security systems such as web proxies, network firewalls, endpoint protection and data leak prevention solutions. From this data, risk scores are then computed and the security systems in turn flags inappropriate behavior and enforces security policies appropriately.

When undertaking this analysis or, in fact, any analysis which uses machine learning or algorithms to make automated decisions which impact people’s lives, we must use a combination of algorithms and human intelligence. Without bringing in human intuition, insights, context and an understanding of psychology, you risk creating algorithms which are themselves biased or make decisions based on flawed or biased data, as discussed above.

In addition to involving human expertise in the algorithms—or in other words—modelling expert knowledge, the right training data and the right data feeding the live analytics is just as important. What constitutes “the right” data? The right data is often determined by the problem itself, how the algorithm is constructed, and whether there are reinforcement loops or even explicit expert involvement is possible. The right data means the right amount, the right training set, the right sampling locations, the right trust in the data, the right timeliness, etc. The biggest problem with the ‘right data’ is that it’s almost impossible to define what bias could be present until a false result is observed. At that point, it’s potentially too late—harm has been caused.

Using machine learning and algorithms in everyday life is still in its infancy but we see the number of applications grow at stunning pace. In 2021, I expect further applications to fail due to inherent bias, and a lack of expert oversight and control of the algorithms. Not the least problem being that the majority of supervised machine learning algorithms act as a blackbox, making verification either impossible or incredibly hard.

This doesn’t mean that all machine learning algorithms are doomed to failure. The good news is that bias is now being discussed and considered in open groups, alongside the efficacy of algorithms. I hope we will continue to develop explainable algorithms that model expert input. The future of machine learning is bright; the application of algorithms in smart ways is only bounded by our imagination.

Additional Resources

For more detail on Forcepoint’s commitment to privacy, please see the Forcepoint Privacy Hub.

Future Insights Takeaways:

In 2021 machine learning and analytics will fall under tightened scrutiny, as trust in their unbiased nature and fairness, as well as ethical boundaries will be questioned.
Machine learning systems must be trained on large enough quantities of data and they have to be carefully assessed for bias and accuracy.
2021 is all about finding this balance, which can only be done through a combination of algorithms and human intelligence.
Without bringing in human intuition, insights, context and an understanding of psychology, you risk creating biased algorithms, which can have life-changing impact.
We must continue to develop explainable algorithms that model expert input.
The future is bright: the application of algorithms in smart ways is bounded only by our imagination.

In the Article

X-Labs

Get insight, analysis & news straight to your inbox

To the Point

Cybersecurity

A Podcast covering latest trends and topics in the world of cybersecurity

Listen Now

Forcepoint DSPM

Forcepoint DDR

Forcepoint DLP

Forcepoint Web Security

Forcepoint Cloud App Security

Risk-Adaptive Protection

Forcepoint DLP for Email

Data Classification

NGFW

SD-WAN

Forcepoint DSPM + DDR

How to Protect Your Data Everywhere

See Our Data Detection & Response (DDR) Software in Action

Forcepoint Data Security Cloud

Prevent Data Loss

Compliance Readiness

Safely Enable GenAI

AI Data Classification

Unified Breach and Incident Readiness

Protect Data in ChatGPT

Data Access Governance

Automate Insider Risk Protection

Comprehensive Email Security

BYOD Security

Secure Microsoft 365 and Copilot

Data Risk Mitigation

Financial Services

Healthcare

Manufacturing

Retail

Public Sector

Clustering in a Way No One Else Can Do

Pinpoint Accuracy and Transparent Reporting

Blogs

Videos

Webcasts

Podcasts

Analyst Reports

Customer Stories

Resource Library

Training & Certifications

Cyber EDU

GUIDE

The Practical Executive’s Guide to Data Loss Prevention

GUIDE

Forcepoint AI Mesh

ANALYST REPORT

Read the Gartner: 2025 Market Guide for Data Loss Prevention

RISK ASSESSMENT

Get Free Data Risk Assessment for OneDrive

Our Approach

Our Customers

Forcepoint vs. Varonis

Forcepoint vs. Cyera

About Us

Newsroom

Work With Us

Contact Us

Forcepoint Trust Hub

VAKIFBANK Strengthens Security Posture and Compliance Reporting

Eczacıbaşı Holding Extends Security to the Cloud to Protect Remote Staff

Communisis Modernizes Their Forcepoint DLP with Risk-Adaptive Protection