January 28, 2014

Data Theft via USB: Combating the Insider Threat

Alexander Watson

Executive Summary

Data breaches and the theft of intellectual property as well as personally identifiable information (PII) are one of the biggest risks that businesses face, and an area that very few security solutions address. Just last week, a consultant working for the Korea Credit Bureau was arrested for allegedly stealing the credit card numbers, social security numbers, and personal details of more than 105.8 million accounts by copying them to a USB drive over an 11-month period. Other examples included the LA Times reporting that Edward Snowden used a USB drive to steal classified documents from the NSA.

In this blog we will discuss (and provide source code) to enable organizations to start protecting their sensitive data by harnessing intelligence from applications already running on their network. If you're ready to dive in now- go ahead and download the queries and lookups on GitHub now.


How is it that an IT consultant was able to siphon account data with a USB drive from his company over an 11 month period? Most security solutions are based on the principle of perimeter defense and keeping the bad guys out. Traditional defenses such as firewalls, antivirus, intrusion prevention and sandboxing solutions do very little to protect against data theft from within a company, where an employee may wittingly or unwittingly steal intellectual property or other sensitive data, using valid credentials.

A new breed of solutions designed to protect information, such as Websense's DLP (Data Leak Prevention) products are designed to work with existing security solutions and business policies to protect against deliberate or inadvertent transmission of a company's sensitive data from the network.

Harnessing Application Telemetry to Protect Your Network

As we will discuss in this blog post, there are a number of that a company's security teams can do to detect suspicious activity which may be the result of data theft. In a previous blog post, we discussed how Microsoft Windows Error Reporting (WER), a.k.a. Dr. Watson, sends detailed telemetry to Microsoft each time an application crashes or fails to update, or a hardware change occurs on the network. We were surprised to learn that a USB drive insertion considered a hardware change, and that detailed information about the USB device and computer that it was plugged into being sent to Microsoft. These logs are sent to Microsoft via HTTP URL-encoded messages. Organizations can use knowledge about their content and how to decode these messages to detect USB drives and devices that could be a risk to the organization. This knowledge can help organizations detect USB drives and devices such as those used in the KCB and Snowden leaks, and automatically generate reports when they are plugged into a secure system. 

We mentioned in the last blog post that the information sent as part of crash logs could be harnessed by organizations. Today we will demonstrate how you can harness intelligence from these crash logs to detect and monitor new USB devices being connected to the network, and hence gain insight into where your company's sensitive data is going. The best part of this? Your company can implement this monitoring for free.

How to know each time a new USB device is connected to your network

In Microsoft Windows environments, a report is sent to Microsoft each time a hardware change happens to a PC. This includes the times that a new USB device is plugged into a computer. In Windows Vista and later, these reports became automated and are part of an opt-out program that Microsoft estimates nearly 80% of PCs in the world participate in. Depending on your operating system, reports are encoded into a GET request to http://watson.microsoft.com (Win XP, Vista, 7) or https://watson.telemetry.microsoft.com in Windows 8.

These reports can be gathered in a variety of ways, either by examining outbound web proxy logs (may we shamelessly suggest Websense Triton Security Gateway), creating an IPS rule in an open source intrusion prevention system such as Snort or Suricata, or by simply monitoring a SPAN port using a sniffer such as Wireshark. In our last blog entry, we discussed an information leakage that can arise with these reports and suggested that organizations set up a group policy that sends reports to an on-premise server which then forces encryption before forwarding to Microsoft. In this case, the reports can be processed at the organization's WER (Windows Error Reporting) collection server.

As we show below, reports from Microsoft WER (Dr. Watson) can be a bit tricky to extract information from (we provide a detailed break-out of fields and and example of how to build reports in Splunk), so let's dive in.

Dr. Watson reports such as the one below have a specific report type for USB inserted devices. We can start by filtering down to messages containing "PnPGenericDriverFound". This is followed by additional information (some is URI encoded) that looks cryptic, but with some lookup tables can be broken out into the following fields:

  • Date
  • USB Device Manufacturer
  • USB Device Identifier
  • USB Device Revision
  • Host computer - default language
  • Host computer - Operating system, 32/64-bit, service pack and update version
  • Host computer - Manufacturer, model and name
  • Host computer - Bios version and unique machine identifier

Step 1: Create the Vendor and Product ID Lookups in your favorite SIEM tool

It turns out the Vendor and Device ID lookups can be a little tricky - but map exactly to Windows and Linux driver databases. To see an example for yourself, try typing "lsusb" from a Linux machine. After scraping some online driver databases, we put together a lookup script that you can use for vendors and device codes that you can download on GitHub. These will obviously need to be updated periodically to remain up to date. Feel free to add new device codes yourself, or check back to our site for updates.

Our manufacturer lookup file is of the format:

  • vid,vid_name
  • 0001,Fry's Electronics
  • 0002,Ingram
  • 0003,Club Mac
  • 0004,Nebraska Furniture Mart
  • 0053,Planex
  • 0079,DragonRise Inc.
  • 0105,Trust International B.V.
  • 0145,Unknown
  • 017C,MLK
  • 0204,Chipsbank Microelectronics Co., Ltd
  • 0218,Hangzhou Worlde
  • 02AD,HUMAX Co., Ltd.
  • 0300,MM300 eBook Reader
  • 0324,OCZ Technology Inc
  • 0325,OCZ Technology Inc

And device IDs are of the format:

  • pid,pid_name
  • 0001142B,Arbiter Systems, Inc.
  • 00017778,Counterfeit flash drive [Kingston]
  • 00535301,GW-US54ZGL 802.11bg
  • 00790006,Generic USB Joystick
  • 00790011,Gamepad
  • 0105145F,NW-3100 802.11b/g 54Mbps Wireless Network Adapter [zd1211]
  • 01450112,Card Reader
  • 017C145F,Trust Deskset
  • 02046025,CBM2080 Flash drive controller
  • 02046026,CBM1180 Flash drive controller
  • 02180301,MIDI Port
  • 02AD138C,PVR Mass Storage
  • 0324BC06,OCZ ATV USB 2.0 Flash Drive
  • 0324BC08,OCZ Rally2/ATV USB 2.0 Flash Drive
  • 0325AC02,ATV Turbo / Rally2 Dual Channel USB 2.0 Flash Drive

Using Splunk or a similar SIEM tool, create lookups to map the vendor and product IDs that you see in the Watson logs above to the manuf_ids.csv and product_ids.csv files that have been attached. Please note that our Product ID lookup contains the VID+PID (Vendor ID and Product ID) together - this is the one you'll most likely want to use in your lookups.

Step 2: Breaking out fields in the Dr. Watson reports

The next step is decoding the relatively complex-looking WER report structure. The diagram above shows popular values, and below we have included some Splunk queries that you can do to detect USB device insertions and create reports about what has been plugged into your network.

Below is a sample Splunk query with comments added to explain what is happening at each field. You'll have to remove the comments before running in Splunk. You can download the full query here.

  1. sourcetype="Watson" watson.microsoft.com // Set the sourcetype for your proxy log, and pre-filter on records that contain watson.microsoft.com
  2. | regex src_ip="^\d+.\d+.\d+" // Use this to make sure it is a valid record with a host IP address
  3. | rex field=cs_uri_path "/(?<bus>\w{3})_VID_(?<vid>\w{4})_PID_(?<pid>\w{4})" // This regular expression extracts BUS and VID variables 
  4. | rex field=cs_uri_path "/(?<bus>\w{3})_VEN_(?<ven>\w{4})_DEV_(?<dev>\w{4})" // This regex extracts Vendor and Device ID variables
  5. | eval vid=if(vid!="",vid,ven) // Make sure a valid Vendor ID is present
  6. | eval pid=if(pid!="",vid+""+pid,dev) // Make sure a value Product ID is present
  7. | eval PC_Manufacturer=urldecode(SM) // Decode the PC Manufacturer from the GET request
  8. | eval PC_Model=urldecode(SPN)| rex field=OS "(?<os_version>\d\.\d+)" // Decode the PC Model from the GET request
  9. | rex field=OS "(?<os_version_detailed>\d\.\d+\.\d+)" // Break out the Operating System Version (e.g. Windows 7)
  10. | eval BV=urldecode(BV) // Decode the computer's BIOS version that generated the report
  11. | rename BV as "BIOS Version"
  12. | rename MID as "Machine ID"
  13. | where isnotnull(os_version) AND isnotnull(bus) // Make sure we are looking at a valid record
  14. | lookup windowsVersion os_version OUTPUT os_description // Lookup the host computer's Windows version (e.g. 6.1.7601 is Windows 7)
  15. | lookup manufId vid OUTPUT vid_name // Lookup the USB device vendor's name from the lookup provided
  16. | lookup productId pid OUTPUT pid_name // Lookup the USB product ID from the lookup provided
  17. | rename vid_name as "Device_Manufacturer"
  18. | table _time, src_ip, bus, vid, Device_Manufacturer, pid, pid_name, os_version_detailed, os_description, PC_Manufacturer, PC_Model, "BIOS Version", "Machine ID" // Output a nice looking table

Tweak as necessary and enjoy a detailed breakout of USB devices that have been attached to your network! (click image to enlarge)


We recommend limiting your search to remove common USB devices that you may not be concerned about, such as keyboards and mice. For example:

  • search pid_name != "*mouse*"
  • search pid_name != "*keyboard*"

Next Steps

Taking this a step further, it's possible to configure your SIEM tool to trigger a report each time a certain type of device (such as mass storage, or smartphone) is connected to a computer on the network. To apply DTP (Data Theft Prevention) context to this, try limiting these reports by filtering on computer names or IP addresses from computers that have (or have access to) sensitive data, such as detecting mass storage devices being connected to, or on the same network segment as, your Hadoop HDFS cluster.

Editor’s note: All published links found to be broken, obsolete or otherwise inactive are subsequently removed from existing entries.

About Forcepoint

Forcepoint is the leading user and data protection cybersecurity company, entrusted to safeguard organizations while driving digital transformation and growth. Our solutions adapt in real-time to how people interact with data, providing secure access while enabling employees to create value.