Preloading with multiple symbol versions
This post will describe the process of preloading an application using the LD_PRELOAD functionality of the dynamic linker and the difficulties encountered when using multiple symbol versions. It will walk-through the steps to correctly intercept symbols with multiple versions and map them to the correct underlying library versions.
What is preloading?
Preloading is a method by which a user can force the dynamic linker to load an additional dynamic shared object (DSO) when launching a particular executable. Preloading a process forces the DSO into the address space of the executing process and executes the DSO’s constructor on startup and its destructor on shutdown. Preloading a process can be down with the LD_PRELOAD environment variable on a process-by-process basis or can be enabled system-wide with the /etc/ld.so.preload configuration file.
Preloading a process allows new code to be inserted and executed within a process without recompiling or relinking the original program. When a DSO is preloaded, the dynamic linker places the preloaded DSO before the system libraries (libc, libpthread, etc.) in the symbol name search order. This means that when an application or one of the application’s dynamic libraries invokes a function within the system libraries, the function is searched for in the preloaded DSO first. This means that if the preloaded DSO exports a function that is the same name as one in the system libraries, the preloaded DSO will be called when the application makes a call to the function. This provides a powerful capability to intercept function calls made from the application.
Preloading Example
Preloading is typically used to interpose new functionality between the application and the system. This model could be used for Aspect-oriented Programming to weave new cross-cutting behavior into existing function calls at run-time without modification to the existing source code or binary. One example where this is common is in debugging utilities. A debugging DSO can be constructed that intercepts system library calls invoked by the application and log or track statistics on the frequency and types of calls made. The debugging DSO can provide valuable information that could track down coding errors or uncover performance bottlenecks.
A common use case for this facility is tracking dynamic memory operations performed by an application during run-time. Tracking application memory allocations and deallocations can locate memory leaks and identify poor memory allocation patterns. Take the following example of a simple DSO that wraps the malloc(3) call:
Compiling this wrapper and preloading it with the simple uname command results in two allocation printouts:
While this is a very simple example, a more advanced debugging library could track both allocations and deallocations and print full reports on exit.
Preloading Example #2
Let’s try another preloading example, but this time let’s override the
pthread_cond_wait
call from the Pthread’s library (NPTL). Once again, here is
our wrapper code:
This time we’ll write a simple threaded example program to test with. The test program fires off a single thread that will wait on a condition variable for a signal. The main thread will sleep for two seconds and then signal the condition variable, awaking the blocked thread. The main thread will wait for the thread to get the signal and exit, then the main thread will exit the entire program.
Now let’s compile/link the wrapper and test program and test the normal and preloaded cases:
In the non-preloaded case our test program exited fine after two seconds. However, when the test program was preloaded with our wrapper it hung and had to be killed with a Control-C from the keyboard. Why did this hang?
What happened?
Why did the test program work fine when executed without a preload, but hang
when preloaded with our wrapper? To figure this out, let’s fire up GDB and
step through what happens when the test program calls pthread_cond_wait
with
and without our preload.
First, let’s run the program without our preload:
We setup a breakpoint on the call to pthread_cond_wait
and verify that it
calls pthread_cond_wait
within NPTL. Now let’s try the same, but with our
wrapper preloaded:
Once again we break on the call to pthread_cond_wait
, but this time when we
step into it it takes us to our wrapper function. This was expected, however,
when we then step into the call to pthread_cond_wait
via the condwait_internal
pointer, it takes us to __pthread_cond_wait_2_0
instead of the
pthread_cond_wait@@GLIBC_2.3.2
function without our wrapper. What is
__pthread_cond_wait_2_0
and why did it takes us there? Let’s look at the
source for __pthread_cond_wait_2_0
:
This code is allocating another condition variable within the original
condition variable structure and passing the new condition variable to the
actual pthread_cond_wait
function. As you can see from the backtrace, the
pthread_cond_wait
function is now blocking on the condition variable located
at address 0x7ffff00008c0 rather than the original one at 0x600d80. However,
the main thread still thinks the condition variable is located at 0x600d80.
Why two different code paths?
To understand why the preloaded version invoked a different pthread_cond_wait
function than the non-preloaded one, let’s take a quick look at the output of
objdump for the NPTL(http://en.wikipedia.org/wiki/Nptl) library.
Searching for the pthread_cond_wait symbol shows that there are two different versions of pthread_cond_wait in libpthread.so.0 – one located at address 0x3cf900b8f0 and the other located at 0x3cf900b240. Also, you will notice that each symbol is followed by a special “GLIBC_X.Y.Z” string. So what do these strings mean?
To understand why there are two different pthread_cond_wait’s with different string suffixes, we first must understand library versioning. In order to support changing the ABI definition of a symbol, the standard practice use to be that the SONAME of a shared library would be updated each time the ABI changed. This practice was referred to as “bumping the major number” because the major number of the library would be incremented by one. For example, the SONAME for a library named libfoobar may be changed from libfoobar.so.1 to libfoobar.so.2 to isolate an ABI change in the .1 version of the library. Programs that required the old ABI would link against the old “.1” SONAME while new programs would get linked against the new “.2” SONAME. This method was not very practical as it required a new library name each time some small ABI change was made, which led to a proliferation of library versions required for old executables. Therefore, a new method was developed that versions the individual symbols in the library instead of the entire library. This allows ABI changes to be made while maintaining the same SONAME and single library file. 2
In this example, there are two different versions of the pthread_cond_wait
symbol. The older version, 2.2.5, is from the old LinuxThreads
implementation, while the new version, 2.3.2, refers to the NPTL
implementation. The double at sign (“@@”) before the 2.3.2 string indicates
that it is the default version of the symbol. Programs that are linked against
libpthread’s will be linked against the default version. This agrees with the
output from our NPTL test program:
It should now be clear that the problem with our wrapper is that while the
test program is linked against and requesting version 2.3.2 of
pthread_cond_wait
, our wrapper is mapping that request to version 2.2.5. This
is because of the use of dlsym in our wrapper code to lookup the real
pthread_cond_wait
function. So how do we fix this?
Versioned symbol lookup
The dlsym call makes an unversioned lookup for the named symbol. By default, this unversioned symbol lookup will match the oldest symbol version in the DSO. If dlsym mapped to the latest symbol in the DSO, then if a new version of the symbol is added in the future, existing programs would map automatically to this symbol. However, if the new symbol version changed the behavior of the function, then existing programs may misbehave when they are mapped to the latest version. It is important to note that even though the “@@” symbol indicates the “default symbol”, this is only true for programs that are linking directly with the DSO – not for lookups performed with dlsym.
Luckily, we can use the dlvsym function to perform a versioned lookup for
the symbol. Dlvsym call is identical to dlsym except that it accepts a third
argument that defines the version of the symbol to search for. The version
number is the string right of the “@” signs – so in our example,
GLIBC_2.3.2 would find the latest pthread_cond_wait
symbol. Let’s try
updating our wrapper:
This time success:
Our wrapper is now mapping the pthread_cond_wait
call to the correct library
version, but are we done?
Creating versioned overrides
We started off by mapping requests for pthread_cond_wait
to version 2.2.5
which caused our test program to hang when it was looking for 2.3.2. We
successfully fixed this problem by mapping our internal wrapper to version
2.3.2 using dlvsym. This will work fine for any executables compiled/linked on
a recent version of the OS (post-2.2.5), however, what happens when we preload
our wrapper on an executable linked on a 2.2.5 (LinuxThreads) OS? In this
scenario we would have the opposite problem from before – we would take
requests for version 2.2.5 and map them to version 2.3.2.
To correctly support both older and newer executables we would like to
intercept calls to pthread_cond_wait
versions 2.2.5 and 2.3.2 and map them to
the same libpthread versions, respectively. Obviously we can create two
individual overrides for pthread_cond_wait
in our wrapper, but how do we
indicate which symbol they override and prevent symbol name collision? For
that, we look to the assembler pseudo opcode “.symver” 3. With .symver we
can define two different pthread_cond_wait
override functions – with
different names – and map each of them to a corresponding versioned override
for the real pthread_cond_wait
. Let’s update our wrapper code once more using
“.symver”:
What we’ve done is added a second pthread_cond_wait
override and suffixed it
and the existing one with the version number they are overriding. Similarly,
we’ve added a second call to dlvsym to lookup the 2.2.5 version of the
pthread_cond_wait symbol. Finally, we have added two inline assembly calls to
the “.symver” instruction that connects our overrides to the respective symbol
version name they are overriding. The .symver instructions will redirect any
2.2.5 or 2.3.2 pthread_cond_wait lookups to the respective overrides. Now
let’s compile this and test it out:
Wait, this now compiles but is failing to link?
The linker is complaining that it needs to know which symbol versions are exported. You specify these mappings to the linker using a version-script. The version script is passed to the linker with the –version-script option. Each .symver opcode must correspond to an entry in the version script. For further explanation of the version script, I suggest reading 2.
For our pthread_cond_wait
wrapper, the corresponding version script looks
like:
Now let’s try building and testing the wrapper again:
Success! We have produced a pthread_cond_wait
wrapper that overrides both
versions 2.2.5 and 2.3.2 as illustrated with the objdump output and which
correctly invokes the underlying pthread_cond_wait
functions from
libpthread’s. We should now be able to repeat this pattern for any additional
symbols we must override.
Concluding Remarks
I strongly recommend reading Ulrich Drepper’s guide to writing shared libraries 2 if you are planning on doing any advanced library work – especially writing DSO preloads.
The examples here were only illustrated for the x86_64 architecture. There’s no guarantee that the available symbol versions for the 32-bit and 64-bit libraries will be the same. If you are building a library which will be compiled for both 32-bit and 64-bit make sure to check the symbol versions for each underlying 32/64-bit library. This may also require different version script’s for each build too.
These override functions were clearly very simple. If your overrides are more complex than mine (can’t see how they wouldn’t be) then you must make sure to respect the ABI differences between the multiple versions you are overriding. There was obviously a reason for providing multiple symbol versions, so make sure you respect that difference or you will break unexpecting applications.
If your library requires using a symbol that you are also overriding, make sure that you internally use the symbol version marked as the default with the ”@@” string.
References
[1] Drepper, Ulrich. ELF Symbol Versioning. http://people.redhat.com/drepper/symbol-versioning
[2] Drepper, Ulrich. How to Write Shared Libraries. http://people.redhat.com/drepper/dsohowto.pdf
[3] .symver. http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/gnu-assembler/symver.html