woody_woodpacker: a XOR binary packer

Disclaimer: The information provided in this article is intended solely for research and educational purposes. The use of ELF binary packers, including their creation, analysis, or modification, should be conducted responsibly and ethically. Any application of this knowledge for malicious activities, unauthorized use, or violation of laws and regulations is strictly discouraged. The author assumes no responsibility for any misuse of the information presented. Readers are encouraged to adhere to all applicable laws and ethical guidelines when conducting their research.

Introduction

A packer is a type of software that is commonly used by malware authors and hackers to compress or encrypt a malicious executable file. The primary purpose is to evade a malware-infected file from being detected by an antivirus software due to obfuscation. Also, it also makes the reverse engineering process more difficult for security researchers.

Even if the term “packer” is implicitly associated with some form of compression, it also nowadays refers to any form of obfuscation, including encryption, anti-debugging, anti-VM, etc.

In this article, we gonna explore the creation of a simple ELF binary packer. For the sake of simplicity, our protector will use a simple XOR encryption algorithm and no compression layer. Needless to say that this packer would not resist serious analysis.

Prior knowledge of Linux ELF executable format, x86-64 assembly, and C programming is required. This article will not cover these topics in detail.

Architecture overview

The packer is composed of two main components. They must have agreed-upon mechanism(s).

The protector

The protector is the program that applies the protection to the target binary. This is a standalone program that takes the target binary as input and outputs a freshly packed binary.

1
2
3
$ ./woody_woodpacker <target_binary>
$ ls
woody

The “runtime engine” or “stub”

The stub is responsible for deobfuscating the protected binary. It sole purpose is to decrypt the binary and pass control to the decrypted code.

Usually, the stub lives into the protected binary, and necessary modifications to the executable binary format have to be done by the protector. It does not have the luxury of using the standard C library : it must be self-sufficient. Following code example will use x86-64 assembly and the follow the System V ABI. Since we are working on Linux, the ELF format will be used.

Packing and executing a target binary with `woody_woodpacker`

Our packed executable will print ....WOODY.... to signal that the binary has been packed. In the following example pack_me is a simple ELF binary that prints Hello World! to the standard output.

1
2
3
4
5
6
7
8
$ ./pack_me
Hello World!
$ ./woody_woodpacker ./pack_me
$ ls woody
woody   pack_me
$ ./woody
....WOODY....
Hello World!

Stub

Starting here, the term stub and parasite will be used interchangeably.

Injecting decryption routine

In Unix Viruses, Silvio Cesare, an Australian security researcher, begins his paper by describing a crude but smart form of infection :

An interesting, yet simple idea for a virus takes note, that when you append one executable to another, the original executable executes, but the latter executable is still intact and retrievable and even executable if copied to a new file and executed.

This is a simple demonstration, host will be appended to parasite. Simply appending host will not change the ELF file structure of parasite, thus parasite will execute as normal.

1
2
3
4
$ cat host >> parasite
$ mv parasite host
$ ./host
PARASITE Executed

Now, if the parasite keeps track of its own length, it can copy the original host to a new file, then execute it like normal, making a working parasite and virus. The algorithm is as follows:

execute parasite work code.
lseek to the end of the parasite.
read the remaining portion of the file that contains the host.
write to a new file.
execute the new file.

lseek to the end of the parasite -> a parasite can actually open itself using the /proc/self symbolic link and the open syscall :

1
2
3
4
#define PARASITE_SIZE xxx

int self = open("/proc/self", O_RDONLY | O_CLOEXEC);
lseek(self, PARASITE_SIZE, SEEK_SET);

What Silvio describes here, is a format agnostic way of infecting a binary. It does not rely on injection of parasite code within a particular executable format : the host is appended to the end of the parasite. It does not require parsing ELF (on Linux) segments or PE (on Windows) sections to find a place for infection. However, we are working in a Linux environnement in this article.

This method has avantages and some tradeoffs :

very reliable : parasite infection method does not rely on the host’s file format. Binary patching is not required.
not strip safe : since the host is appended at the end of the file, it is not described by any sections or segments by the ELF file format. Consequently, running strip on the infected binary will break the infection.
hide the host ELF : running readelf for example on the packed binary will reveal only the stub segments, sections, and symbols.

Schematicaly, our packed binary file will look like this :

Packed binary overview

Passing control back to the host binary

It is the stub responsability to decrypt the host, and load it into memory. The process of loading a binary without using the exec syscall family is commonly called the user-land exec and has been documented by the grubq in this paper. This term refers to the fact that the ELF loading is done in user-land, without kernel help.

However, it is nevertheless possible for a stub to rely on the kernel exec functions and Silvio indicate that the original binary should be write to a new file and be executed. We can easily think of an example : the stub writes to tmp/bin and use the execve syscall on that tempory file.

The obvious disavantage of this technique is that it leaves traces in the filesystem. Fortunatly, the Linux kernel developers had developped a few syscalls since Silvio’s article that might help us to be stealthy.

execveat

execveat is a syscall that was added in Linux 3.19, Sun, 8 Feb 2015.

The man page from libc execveat describes the syscall :

execveat man page

With this in mind, we can refine Silvio’s algorithm :

execute parasite work code
lseek to the end of the parasite
call execveat

Could that be that easy ? The answer is no, but we are close. The problem is that the execveat syscall does not care about the file offset of the file descriptor fd.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#define SELF_SIZE xxx

#include <unistd.h>
#include <fcntl.h>

int main(int argc, char **argv, char**envp) {
    (void) argc;

    int self = open("/proc/self", O_RDONLY | O_CLOEXEC);

    lseek(self, SELF_SIZE, SEEK_SET);

    execveat(fd, "", argv, envp, AT_EMPTY_PATH);
}

This will basically result in an infinite loop. execveat will execute the parasite again and again, without taking into account the fact that we moved the file offset with lseek. We need to have a file descriptor dedicated to the host code. A file descriptor is an abstraction, and it doesn´t necessaraly refers to a file stored on the filesystem.

memfd_create

This is what the man page have to say about the memfd_create syscall :

memfd_create man page

Our algorithm ends up as follow, including the host decryption :

execute parasite work code
lseek to the end of the parasite
create a memory file descriptor
copy host from parasite to memory file descriptor
decrypt host code
call execveat on the memory file descriptor

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#define PARASITE_SIZE xxx
#define SELF_SIZE xxx

#include <sys/mman.h>
#include <sys/sendfile.h>
#include <unistd.h>
#include <fcntl.h>

const uint8_t key[] = {...};

void xor_cypher(void *mem, size_t mem_sz, const uint8_t *key);

int main(int argc, char **argv, char**envp) {
    (void) argc;

    int self = open("/proc/self", O_RDONLY | O_CLOEXEC);
    lseek(self, SELF_SIZE, SEEK_SET);
    int memfd = memfd_create("woody", 0);
    sendfile(memfd, self, 0, PARASITE_SIZE);
    void * host_mptr = mmap(NULL, PARASITE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, memfd, 0);
    xor_cypher(host_mptr, PARASITE_SIZE, key);
    execveat(fd, "", argv, envp, AT_EMPTY_PATH);
}

The sendfile syscall do the copy part in kernel space. This is more efficient than a read write combo, that transfers data from user space and kernel space multiple times. Our stub is almost finished : since the host is crypted, we need to decrypt it prior to execution. We can access the memory area of the memory file descriptor by using mmap.

Stub overview diagram

Putting it together

Instead of using C code, we will write the stub in x86 64 assembly without using libc syscall wrapper. This will allow us to have smallest size footprint on the final stub executable.

Do note that the protector will have to patch the key and the payload_size in the .rodata section. stub_size must be determined manually with ls -l stub once the stub is compiled.

Once the stub is assembled and linked, we can dump each byte of the stub into a C array with the following command :

1
$ xxd -C -i stub > stub.c

This array will be used by the protector to write the stub into the packed binary.

Finding offsets of `key` and `payload_size` in the stub

Before, compile the stub with nasm -f elf64 woody.s -o woody.o && ld woody.o -znoseparate-code -o woody.

To know the offset of the key and payload_size variables in the .rodata section, we can use readelf --symbols ./woody and look for the symbol addresses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18

Symbol table '.symtab' contains 15 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS woody.s
     2: 0000000000400198     0 NOTYPE  LOCAL  DEFAULT    2 message
     3: 000000000000000f     0 NOTYPE  LOCAL  DEFAULT  ABS message_len
     4: 00000000004001a7     0 NOTYPE  LOCAL  DEFAULT    2 self_path
     5: 00000000004001b6     0 NOTYPE  LOCAL  DEFAULT    2 empty_str
     6: 00000000004001b7     0 NOTYPE  LOCAL  DEFAULT    2 woody_size
     7: 00000000004001bf     0 NOTYPE  LOCAL  DEFAULT    2 payload_size
     8: 00000000004001c7     0 NOTYPE  LOCAL  DEFAULT    2 key
     9: 000000000040012c     0 NOTYPE  LOCAL  DEFAULT    1 decrypt
    10: 000000000040018a     0 NOTYPE  LOCAL  DEFAULT    1 _exit
    11: 0000000000400080     0 NOTYPE  GLOBAL DEFAULT    1 _start
    12: 00000000004011e7     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
    13: 00000000004011e7     0 NOTYPE  GLOBAL DEFAULT    2 _edata
    14: 00000000004011e8     0 NOTYPE  GLOBAL DEFAULT    2 _end

The Value column indicates the virtual address of each symbol. To get the offset in file of the .rodata section :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Elf file type is EXEC (Executable file)
Entry point 0x400080
There is 1 program header, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000001e7 0x00000000000001e7  R E    0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .text .rodata 

There is a single segment in our stub binary, that loads both the .text and .rodata sections. The segment starts at offset 0x0 in file, thus the offset of each symbol in the file is simply its virtual address minus the segment virtual address 0x400000.

For the key : 0x4001c7 - 0x400000 = 0x1c7

For the payload_size : 0x4001bf - 0x400000 = 0x1bf

Protector

The protector algorithm is as follows :

open the binary to pack
patch the stub shellcode to set the key for decryption and the size of the payload to protect.
encrypt the target binary with the same key.
write the stub followed by the encrypted target binary to the output file.

Our protector will takes the target binary as argument and will output the packed binary named woody. Optionally, a second argument can be provided to set the XOR key used for encryption/decryption. If no key is provided, a default key will be randomly generated from /dev/urandom.

This header file contains the necessary offsets for the symbols previously discussed.

Patching the stub with the key and payload size is just a matter of writing at the right offset.

This is the complete protector code :

Building and testing

The full project can be found on GitHub.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ make all

gcc -MMD -c -g3 -Wall -Wextra -Werror sources/main.c -o objects/main.o
gcc -MMD -c -g3 -Wall -Wextra -Werror sources/woody.c -o objects/woody.o
gcc -Wall -Wextra -Werror ./objects/main.o ./objects/woody.o -o woody_woodpacker

$ ./woody_woodpacker /bin/ls

114754534D9A13A1413F8AC389A0A2338171463F3585F8AB279E7F74683DAF1C

$ ls -la .

total 284
drwxrwxr-x  5 plouvel plouvel    200 Oct 17 12:57 .
drwxrwxrwt 25 root    root       640 Oct 17 12:58 ..
drwxrwxr-x  8 plouvel plouvel    260 Oct 17 12:33 .git
-rw-rw-r--  1 plouvel plouvel    476 Oct 17 12:33 .gitignore
-rw-rw-r--  1 plouvel plouvel  35149 Oct 17 12:33 LICENSE
-rw-rw-r--  1 plouvel plouvel    563 Oct 17 12:33 Makefile
drwxrwxr-x  2 plouvel plouvel    120 Oct 17 12:57 objects
drwxrwxr-x  2 plouvel plouvel    160 Oct 17 12:35 sources
-rwxr-xr-x  1 plouvel plouvel 159400 Oct 17 12:58 woody
-rwxrwxr-x  1 plouvel plouvel  82248 Oct 17 12:57 woody_woodpacker

Now, woody should act as a packed version of /bin/ls, and print ....WOODY.... prior to executing the real /bin/ls code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ ./woody -la .
....WOODY....
total 284
drwxrwxr-x  5 plouvel plouvel    200 Oct 17 12:57 .
drwxrwxrwt 25 root    root       640 Oct 17 13:00 ..
drwxrwxr-x  8 plouvel plouvel    260 Oct 17 12:33 .git
-rw-rw-r--  1 plouvel plouvel    476 Oct 17 12:33 .gitignore
-rw-rw-r--  1 plouvel plouvel  35149 Oct 17 12:33 LICENSE
-rw-rw-r--  1 plouvel plouvel    563 Oct 17 12:33 Makefile
drwxrwxr-x  2 plouvel plouvel    120 Oct 17 12:57 objects
drwxrwxr-x  2 plouvel plouvel    160 Oct 17 12:35 sources
-rwxr-xr-x  1 plouvel plouvel 159400 Oct 17 12:58 woody
-rwxrwxr-x  1 plouvel plouvel  82248 Oct 17 12:57 woody_woodpacker

We can even see the syscalls used by the stub with strace :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ strace ./woody -la | head
execve("./woody", ["./woody", "-la"], 0x7ffc156d9c28 /* 81 vars */) = 0
open("/proc/self/exe", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 768, SEEK_SET)                 = 768
memfd_create("", MFD_CLOEXEC)           = 4
sendfile(4, 3, NULL, 158632)            = 158632
mmap(NULL, 158632, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7ff3b5626000
write(1, "....WOODY....\n\0", 15)       = 15
....WOODY....
execveat(4, "", ["./woody", "-la"], 0x7ffd0bcd9e50 /* 81 vars */, AT_EMPTY_PATH) = 0

From now, the rest of the execution is done by the host binary, which is /bin/ls in our case.

1
2
3
4
brk(NULL)                               = 0x55dbca2b9000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb90319000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
...

Conclusion

In this article, we have explored the creation of a simple ELF binary packer using XOR encryption. We have discussed the architecture of the packer, including the protector and the stub components. The protector is responsible for applying the protection to the target binary, while the stub is responsible for decrypting and executing the protected binary.

Packer can be hugely more complex than what we have seen here. The technique presented here would not last a few hours against a determined reverse engineer.