Linux code injection paint-by-numbers. Can we launch a pro…

The most common pattern is fork() → execve(). Where the fork() syscall create a duplicate of the running process context and execve() overlays a copy of the target program onto that context.

After calling fork(), we’ll have two processes, the original one and a new - duplicated - one (with a new pid).

Control will return from fork() to both process instances. In the child process, the return value will simply by 0, in the parent it will hold the pid of the child.

Thus we can determine whether we are running in the child context and call execv() accordingly, while allowing the parent to continue.

Now, let’s take a look at where the auditing hooks lie. From calling execve(), we’ll eventually land up in exec_binrpm().

Without delving too deeply, this function resolves the interpreter handler for the target we’re trying to execute. (Here we’re executing an ELF binary, so we’ll get the relevant ELF handler). But that is a topic for another day.

Prior to exec_binrpm() returning, audit hooks will be called.

In our scenario, we *want* these hooks to be called so the original target executable is identified, but we don’t want the target to actually execute.

On Windows, all processes are created suspended; CreateProcess could be called with the CREATE_SUSPENDED flag in order to instruct Kernel32.dll not to get the kernel to resume the target after process setup.

On Linux, process execution will continue immediately after execv() so we must do something different. We can use ptrace() to control execution of the child target.

This is set up by first instructing the kernel from the child context that it wants to be traced (PTRACE_TRACEME) and then instructing the parent process to wait on the first trap.

By default, this will happen on exit of the execve() syscall.

Now that we have a form of suspended process creation, we need to decide what to do with it.

The options here are numerous. In this example, we want to chose a strategy that doesn’t require us doing any image/reloc fix-up foo.

We can use dlopen() to do all the heavy lifting.

The first issue is that we have suspended execution in a state prior to libc being mapped in and made available to the target process. As (almost?) all dynamically linked processes will require libc, we can let this happen naturally by letting the target run until this is done.

It is important to ensure that (1) the execution progresses to a point where libc is mapped and (2) the process is in a state where we can safely hijack execution flow, but (3) is prior to actual target execution (+ generic enough to support various targets).

I initially tried trapping target execution after libc is mapped (executing until relevant close()).

Naturally this failed as the process was not yet sufficiently set up to support stable execution (for e.g. stack sentinel storage via mov rax,QWORD PTR fs:0x28 in the function prologue would fail; fs is not yet sane).

To achieve a state where (1), (2), and (3) are all honoured, we can trap a little further down. I chose to target brk().

To recap:

We’ve created a child process and halted execution prior to anything too process-specific having been run but after basic setup has taken place.

Now we need to get inject our code.

As mentioned earlier, the plan is to use bog-standard dlopen() to get the code staged in the target.
But how to locate dlopen()?

A cursory glance shows that dlopen() is exported by libdl. But alas this library is not loaded in our process address space.

Ultimately, however, __libc_dlopen_mode() is the underlying libc function that will do the work and that is available to us.

First, we’re going to need to get the offset of the __libc_dlopen_mode() function within libc.

The easiest way I could think of, of doing this programmatically was simply to use the dynamic linker within the parent process context + calculating the offset from the loaded library address.

dlopen(libc) → dlsym(__libc_dlopen_mode)

The library address can be obtained from the link_map structure returned by dlopen().

A caveat to keep in mind here is that taking the fcn_addr - lm->l_addr yields an offset which includes the difference between the address in the ELF binary and where address where it was loaded in memory.

We will account for this offset skew shortly.

Next, we’ll obtain the address of the libc instance that is mapped in our target process.

Procfs exposes mapping info in /proc//maps. We can look up the mapped address of the executable section of libc, accounting for the offset of the in-memory address of the mapped ELF and calculate a final value for __libc_dlopen_mode() in the target.

My implementation of this bit is, regrettably, quick & dirty.

Finally, we need to set the necessary arguments for __libc_dlopen_mode() and call the function within the target process context.

Remembering that we don’t care to continue with the original target flow at any point, we can hijack execution by pointing rip to the address of __libcdl_open_mode() that we just calculated.

The function signature for __libcdl_open_mode() matches that of dlopen() - with the addition of an explicit *dl_caller which I just set to NULL.

x86_64 calling convention dictates that we’ll be using registers rdi (library path), rsi (mode), rdx (dl caller).

rdi holds a pointer to the library path. We need somewhere writeable to put it.

The easy choice here is just to dump it somewhere on the stack (we’re not interested in a sane return from __libc_dlopen_mode() after all).

Everything is now set up. Releasing target execution will result in our code being loaded into the target address space via __libc_dlopen_mode().

At some point, something will break in the target application (remember, we have hijacked rip and corrupted the stack).

This is a great outcome as it’ll trap back into the parent process and allow us to redirect control to our injected code.

(I did initially mess around with getting better control over the return from libc but honestly it didn’t seem worth the bother.)

Calling our injected code is as simple as pointing rip at it and resuming execution. (We discover its loaded address in the very same way that we discovered that of __libc_dlopen_mode() previously.)

Finally we can detach the parent process.

And we’re done.

Now in terms of forensics:

Auditing ptrace() is the obvious go-to for real-time process injection determination

Beyond that; process memory space anomalies (in this example, the injected code will appear as a mapped image) + the usual gamut of behavioural analysis opportunities