October 3, 2018

Analyzing WebAssembly binaries: initial feel and behavioral analysis

John Bergbom Senior Security Researcher

A few months ago, we published an introductory blog post about reverse engineering WebAssembly (Wasm) applications. Given the increased usage and coverage of Wasm in the meantime, now seems like a good time to revisit the topic to look at analyzing an unknown binary and at some of the tools that have become available in the meantime.

This post will introduce Wasm’s memory structure and then look at performing high-level behavioral analysis of an unknown (-ish) sample and later we will follow up with another post where we will manually analyze the same Wasm sample, but going even deeper.

Note: We use several publicly available tools from GitHub throughout this series of blog posts. Please exercise caution when installing new software on your machine (or, better still, your VM) and bear in mind that Forcepoint is not responsible for these accounts or the tools discussed in this blog post.

Wasm stack machine, local variables and global memory

First, let’s look at how Wasm handles memory and data. While this is not essential to understand this post, it is helpful for understanding how variables are being sent to the sample we’re about to look at and is foundational for the more manual analysis we’ll be doing in the next post.

Instead of using registers, Wasm uses a stack for handling data. Essentially, when executing, it logically pops and pushes numbers to or from the stack, and does some simple arithmetic in between. (Under the hood, the browser converts the Wasm to something that executes more efficiently than a stack machine, but conceptually, you can think of Wasm as using a stack machine.)

For the unfamiliar, a stack is a last-in-first-out (LIFO) data structure. This means you access items on the stack in reverse order of how they were added:

Figure 1: The ‘LIFO’ Data Stack (Data stack.svg, on Wikimedia Commons User:Boivie)

Essentially, every type of Wasm instruction pushes and/or pops a 32-bit or 64-bit value to/from the stack. At the start of a function, the stack is empty. As the function executes, the stack is gradually filled up and emptied.

Parameters passed to functions as well as local function variables are stored in a local memory area separate from the stack. Arithmetic instructions operate on stack variables, so often you need to copy variables from the local memory to the stack (get_local), or move the result from the stack to a local variable in local memory (set_local).

As an example, let’s say you want to multiply the first parameter passed to a function with two. That would be done like this in Wasm:

  1. Use the get_local instruction to push the function parameter from local memory to the stack.
  2. Use the i32.const 2 instruction to push the number two to the stack.
  3. Run the instruction i32.mul, which pops the last two values residing on the stack, multiplies them together, and pushes the result to the stack.

The return value of a function is simply the final value left on the stack.

Finally, we have global memory, which is the entire memory space of a Wasm instance. It is implemented as an array in JavaScript – something we’ll see in just a moment.

Don’t worry if the internal Wasm functions here seem alien: we’ll look at those in more depth in an upcoming blog.

The unknown sample

Let’s analyze a Wasm program that can be found here. In the first phase of our analysis, we want to develop an initial feel for the sample.

Of course, we can cheat and just look at the documentation to figure out what the program does, but for the purpose of practicing reverse engineering, let’s pretend it’s an unknown sample that we want to analyze. Let’s say we found it in the wild, on a web page that looks like this:

 1:  <html>
 2:  <body>
 3:  <script>
 4:  request = new XMLHttpRequest();
 5:  request.open('GET', 'quicksort.wasm');
 6:  request.responseType = 'arraybuffer';
 7:  request.send();
 8:  request.onload = function() {
 9:    var bytes = request.response;
10:    let m = new WebAssembly.Instance(new WebAssembly.Module(bytes));
11:    var h = m.exports.memory.buffer;
12:    let intView = new Int32Array(h);
13:    var unsorted = [5,2,9,3,7,1,6,4,8,0];
14:    intView.set(unsorted,0);
15:    console.log('unsorted = ' + intView.slice(0,unsorted.length));
16:    m.exports.sort(0,unsorted.length*4);
17:    m.exports.sort(0,unsorted.length*4);
18:    console.log('sorted = ' + intView.slice(1,unsorted.length+1));
19:  }
20:  </script>
21:  </body>
22:  </html>

Note the declaration of the global WebAssembly memory buffer and the way the code places the values for ‘unsorted’ direct into the buffer between lines 11 and 14.

Based on the name of the sample, and the HTML showing a call to a function named ‘sort’, the sample certainly looks like a QuickSort implementation. Looking at the JavaScript console, which we didn’t show here, would show the output of the script, giving further indications that this is a sorting algorithm.

The WebAssembly Binary Toolkit

Sometimes you might only have the Wasm binary available, without the enclosing HTML. Therefore, let’s show an alternative way to look at functions.

The first thing we want to do is to look at the imported and exported objects of the Wasm binary. That can be done by turning the Wasm binary into its textual representation. For that, we can use the wasm2wat tool that’s part of The WebAssembly Binary Toolkit.

$ ./wasm2wat quicksort.wasm|grep –E "import|export"
  (export "memory" (memory 0))
  (export "partition" (func 0))
  (export "sort" (func 1)))

We see that it does not import any function, but it exports two functions to JavaScript called ‘partition’ (function number 0) and ‘sort’ (function number 1).

Let’s see what parameters the ‘sort’ function takes, and what it returns:

$ ./wasm2wat quicksort.wasm|grep "func.*1.*type"
  (func (;1;) (type 1) (param i32 i32) (result i32)

We see that the ‘sort’ function takes two integers and returns another integer.

Even without the benefit of the HTML wrapper: based on the name of the Wasm binary, and the properties of the exported functions, it seems likely that we’re dealing with a sorting algorithm.

Behavioral analysis using Life

In real life, a malicious Wasm binary would likely have a less-revealing name and the function names would likely be obfuscated. Therefore, we need to dig deeper into the code to find out what it actually does. Before doing that though, it may be worth doing some behavioral analysis, which will allow us to verify or refute our initial theory, and guide us in the right direction when doing code analysis later, saving valuable analyst time.

One way to do behavioral analysis could simply be to copy the HTML code shown earlier to a local file, and then change the values of the variable ‘unsorted’ and see what you get in the JavaScript console, adding in some console.log() commands of our own if needs be.

That may not always be practical, so let’s introduce a tool that will allow us to query the sort function of the Wasm sample from the command line.

We already know that the function called by the HTML (‘sort’ in this case) takes two parameters. We don’t know what those parameters are, but the JavaScript in the HTML file indicates that the parameters might be related to the length of the input somehow. Also, we note from the JavaScript that the content of the variable named ‘unsorted’ isn’t fed into the sort function, but rather it’s placed directly into the memory of the Wasm instance, before calling sort. This leads us to suspect that the two parameters to sort are the start and end index of where in memory the data being passed to the function resides.

Now let’s use the Life Wasm VM, which will allow us to interact with the Wasm sample from the command line.

Note: Life is based on Go, so you need to have Go installed first. Full instructions on how to use Life can be found here.

Using Life, we’ll make a script that initializes a new Wasm VM with our sample, puts any command line parameters into the memory of the instance, and then finally calls the sort function.

Essentially, it’s a rewrite of the earlier JavaScript in Go, with changes necessary to make it fit within the Life framework. We’ll save the script as call_sort.go, and it looks like this:

 1:  package main
 2:  import (
 3:          "os"
 4:          "fmt"
 5:          "github.com/perlin-network/life/exec"
 6:          "io/ioutil"
 7:          "encoding/binary"
 8:          "strconv"
 9:  )
10:  func main() {
11:    bytes, _ := ioutil.ReadFile(os.Args[1])
12:    vm, _ := exec.NewVirtualMachine(bytes, exec.VMConfig{}, new(exec.NopResolver))
13:    //Put the arguments into the memory of the instance (in little-endian format):
14:    for i := 0; i < len(os.Args[2:]); i++ {
15:      argInt, _ := strconv.Atoi(os.Args[2+i])
16:      binary.LittleEndian.PutUint32(vm.Memory[i*4:], uint32(argInt))
17:    }
18:    id, _ := vm.GetFunctionExport("sort")
19:    vm.Run(id, 0, int64(len(os.Args[2:]))*4)
20:    vm.Run(id, 0, int64(len(os.Args[2:]))*4)
21:    fmt.Printf("Result: ")
22:    for i := 0; i < len(os.Args[2:]); i++ {
23:      fmt.Printf("%d, ",binary.LittleEndian.Uint32(vm.Memory[4+i*4:]))
24:    }
25:    fmt.Println("")
26:  }

Now let’s use the command line to query the sort function for different inputs and then see what we get:

$ go run call_sort.go quicksort.wasm 5 2 9 3 7 1 6 4 8 0 2> /dev/null
Result: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
$ go run call_sort.go quicksort.wasm 13392 8 4 90000 234234 333 2> /dev/null
Result: 4, 8, 333, 13392, 90000, 234234,

Even without the helpful function and filenames, behavioral analysis confirms our initial theory that this is a sorting algorithm.

Of course, as we mentioned, real-world malicious samples are unlikely to be so helpful, so in our next post we will get our hands dirtier with some more ‘manual’ analysis by looking at the Wasm text format in more depth.


John Bergbom

Senior Security Researcher

John Bergbom is a Senior Security Researcher on Forcepoint’s Special Investigations team within Forcepoint Security Labs. He investigates a range of topics ranging from malware analysis and reverse engineering to the security implications of new technologies. From previous roles, he has...

Read more articles by John Bergbom

About Forcepoint

Forcepoint is the leading user and data protection cybersecurity company, entrusted to safeguard organizations while driving digital transformation and growth. Our solutions adapt in real-time to how people interact with data, providing secure access while enabling employees to create value.