In this blog post, I want to cover a specific type of code obfuscation and then demonstrate how to manually, step-by-step deobfuscate the code. There are many automated tools and methods for performing deobfuscation, but I feel it's important to get down to the attacker's level to gain a more intimate understanding of attackers and obfuscation algorithms. This understanding helps us create better signatures to identify malicious content with our Threatseeker Network. After all, the best way to protect yourself and others from attack is to understand your attacker so that you have a better chance at proactive protection. Now, on to an example of obfuscated attack code.
It's important to note that sites that have this code are most likely legitimate sites that have fallen prey to malicious code injection. This means that the site has been compromised by an attacker. The attacker inserts malicious code onto the compromised site and the injected malicious code executes when visitors visit the site. The attack code can either be on the compromised site or on another site to which the injected code redirects the visitor. We can think of the injected site as a vehicle for getting the attack code to run on victim computers. Below is a screenshot of the injected code that we're going to study.
Injected code on an innocent site:
For most people who see this malicious code, their eyes go crossed and they have no idea what they are looking at. This is the attacker's intent. Attackers don't want anybody viewing the source of the page to recognize that their injected code is doing something bad. So our first step is to format this script code so that it's easier for our eyes and brains to handle. You'll want to grab the code, put it into your favorite text editor and format it so that it looks like actual code. When that's done, you should feel that the code is easier to read and much less intimidating to review.
Here is the code copied from the source of the page and formatted:
Now that the code is nicely formatted, we can see that there are a number of function definitions in the script. In each of the function definitions we can see a variable declared with a peculiar string of numbers in a specific pattern. We can also see that this variable seems to be followed by a for loop. The for loop attracts my eyes straight away. Typically, a for loop that follows a peculiar variable definition is a red flag for a deobfuscation routine. For the rest of this post, we'll focus on one of the function definitions.
Here is the function definition we are going to work with:
Looking at this function, there is further work that we can do to make things easier for our eyes and brains. First, notice that the variable names are random and not meaningful. This, again, is designed to throw us off from understanding what is going on. But we are tenacious and not about to give up. So the next thing to do is to review the variable names, including where and how they are used. If there are variables that are static throughout, then let's do simple search and replace for the variable names. In this case, we can do a search and replace for CcySlu=4 and vcN=5.
We should also look for any places where function declarations are used in a similar way. For example functionXKJepVPIJ(c) is simply returning the string representation for a character code that is passed in. So anywhere we see a call to XKJepVPIJ, we can replace it with String.fromCharCode. Finally, in this step let's perform any mathematical operations in the function, so that we are left with a single number instead of a series of numbers and operations that we would have to think about every time we come across them in a loop.
Here's a look at the function after performing the above steps:
So I pulled out my local library card and hopped on my bike to do some research. Actually, that was a middle school flashback -- I'm showing my age here! I simply did a Google search for the parseInt function to learn what it does. According to my research, parseInt basically gives me the decimal value of what is passed in. Because there is no second value passed into any use of parseInt in our function, the use of the parseInt function is not necessary. So we can remove the parseInt calls. After parseInt is removed, we can rename some of the random variable names to some friendlier looking names and we're left with some readable code to step through.
This is the final resulting code. It's much easier to get my head around. I've also put a few comments inline:
For those of you wishing to try and step through this:
Now that this first function has been decoded, remember that there were multiple function definitions in this script injection. You should begin to see a script redirection created by the injected code. This script redirects visitors to an attack site while they are visiting the original site, which was injected with the above obfuscated code. As you can see, there was a lot of work done to hide the intent of the injected code. This obfuscation work is an attempt to evade recognition and removal of the injected code from a legitimate site. By understanding the deobfuscation process, we can generate more generic signatures that will help identify variations of this script injection.
Security Researcher: Chris Astacio