Malicious PDF Challenges
Cyber criminals are constantly evolving newer and newer tricks to bypass security scanners with their malicious creations. In this blog we will review one of these: a challenging PDF obfuscation that we have recently seen in a mass injection.
Our ThreatSeeker Network detected several hundred Web sites injected with an iframe where the payload Web site was located in Ukraine. When a user visits a legitimate Web site which was compromised by this threat, the browser automatically follows the injected iframe to a malicious site, downloading the PDF file. This then utilizes several vulnerabilities in the PDF viewer, infecting the computer silently which downloads further malware to the machine. Although Websense customers were already protected from this threat by our existing real-time analytics, we found it interesting to analyze the malicious PDF file in detail.
/Names [(42a0e) 12 0 R (140) 14 0 R (57) 16 0 R]
After this all we need to do is to tweak the code a bit so it can be analyzed by a human. The steps we have taken were only a few easy modifications, including indenting, otherwise known as beautifying the code. That simple step made the code generally readable but still it was hard to understand how it works - partly because the script was injected with junk code fragments that were irrelevant to the algorithm itself and the only reason the fragments were there was to make the code difficult to read. After removing these, the code shrank to half the size and was much easier to understand. There was only one small final step left: to analyze each of the procedures there and rename obfuscated variable and function names to relevant ones. It only took a few minutes, and the result was this:
To understand better we need to examine an another PDF object stream, the page content:
It might look a bit cryptic at first glance, but for the sake of simplicity it is enough to know that it only sends some rendering information to the PDF viewer and then after the 'Td' tag inside the brackets we can see the text content itself. We are interested only in the text, as we do not need to know how it would be displayed. As we can see, it contains loads of short words, each one of which is basically a three or four digit hexadecimal number. The algorithm shown above tells us that it uses only the last two digits from each one of these numbers, so we need only those for a successful de-obfuscation. Also easy to spot in the decoder is that it uses a lightweight decryption routine to get back the clear content of the encrypted stream.
Now that we know enough let's decode it! We ended up revealing this code:
From this point we can clearly see the obvious malicious content. Our malicious PDF sample utilizes several different vulnerabilities which have suffered exploit attempts with a shellcode sprayed all over the heap.
By the time of writing the malicious PDF was detected only by four antivirus products, according to VirusTotal:
The shellcode downloads a trojan which also had a poor detection rate: