April 27, 2015

Charting the Unexplored Threat Galaxy

Ran Mosessco Principal Security Researcher

We live in a world where the cyber threat landscape is very dynamic. Actionable threat intelligence is buried deep within terabytes of seemingly interesting but irrelevant data. Plausible deniability, false positives, lack of traceability and attribution, skillful attackers, adaptation of warfare techniques, and the like only add to the confusion. How does one bubble up prioritized, actionable threat intelligence in an automated fashion from the depths of the data morass? At Websense Security Labs, we believe the answer lies in using big data techniques with unsupervised machine learning to identify similarities and differences in entities such as URLs, IP addresses, files, emails, users, and others. This provides a way to cluster behavior based on any number of attributes in conjunction with the risk propagation analytic (context around the clustered nodes) in an automated fashion.

Threat Galaxy

In most of the recent high profile breaches (such as Target, Sony, etc.) we have seen that, in hindsight, it was discovered that all the indicators were very much present in the environment for extended periods of time, buried deep within the terabytes of log data from various security appliances. If somehow these indicators had been prioritized in a timely manner, the breaches could have been stopped or at least discovered months before they actually came to light, thereby limiting the potential damage and severe economic loss.

At a high level of abstraction, Threat Galaxy discovery is enabled by clustering across a broad, versatile definition of indicators that generalizes this approach across threat channels (such as email, web, file, reputation, etc.) while being able to dynamically model behaviors over time to observe changes and trends.

Let's consider a sample cluster of interest:

Legend: Red – malicious, Pink – suspicious, White – benign

Before risk propagation After risk propagation
Threat Galaxy

As is evident from the comparison of the two snapshots above, showing the same cluster of nodes before and after risk propagation, significantly more nodes were classified as malicious or suspicious. Specifically this cluster houses threats such as Zeus C&C, Phishing/Fraudkit, Malicious Injection, Alina PoS C&C, Phishing/FakeBank and FakeEmailPortal. The cluster covers the stages of Lure, Dropper, Exploit Kit and Call Home of the attack kill chain.  

Threats Stages of the attack kill chain

As demonstrated above, this approach enabled automated identification of additional nodes (indicators of compromise) belonging to the threats in the cluster. In terms of numbers, we uncovered 31% more nodes that could be classified as dangerous in an automated fashion, thereby enhancing protection against these threats. 


With attacks becoming more advanced and sophisticated each day, combining big data engineering, unsupervised machine learning, global threat intelligence and cybersecurity know-how is required to deal with them in a timely, automated and efficient manner. However, this is a rare combination and we at Websense Security Labs are charting the unexplored threat galaxy for our customers by proactively finding additional indicators of compromise (IoC) for early detection and prevention of attacks.


Contributors: Amy Steier, Rajiv Motwani with inputs from Ran Mosessco, Brandon Laux and Sindyan Bakkal


Ran Mosessco

Principal Security Researcher

About Forcepoint

Forcepoint is the leading user and data protection cybersecurity company, entrusted to safeguard organizations while driving digital transformation and growth. Our solutions adapt in real-time to how people interact with data, providing secure access while enabling employees to create value.