Cleanup old EBS snapshots

posted: April 16th, 2010

At Librato we rely on Amazon EBS as part of ourĀ Silverline infrastructure. Accordingly, we periodically snapshot our EBS volumes in case of data loss. I wanted to find a script that would cleanup our sprawl of snapshots that accumulated over time.

A quick search found a number of PHP solutions, but we have stuck with mostly BASH and Ruby for infrastructure scripting so far so we didn’t want another dependency. Anyways, I stumbled across this Ruby script over at ElastDream which did most of what I was looking for. However, I wanted to be able to specify a minimum number of snapshots to keep, regardless of time, so that you can’t accidentally delete all snapshots for a particular volume.

This script supports the following configuration options:

Usage: cleanup_snapshots [options] <volume>

    -h, --help           Display this screen
        --key KEY        Amazon access key
        --secret KEY     Amazon secret key
        --days DAYS      How many days back to keep (default: 15)
        --min KEEP       Minimum number of snapshots to keep (default: 5)
        --verbose        Enable verbose output
        --uri URI        Use this EC2 URI instead of default (e.g. EU-West)

Find the source below, hope this is helpful for others. ;-)

Mixing HR timers with itimers

posted: September 4th, 2009

While attempting to come up with an example for how it is difficult to differentiate the High Resolution (HR) timers from user timers – for EINTR purposes – I came across a slightly different problem. One newer kernels (at least >= 2.6.25 but not 2.6.18), a periodically firing HR timer will appear to prevent an itimer from generating an EINTR at all for a blocking system call.

My test program creates an HR timer on a one second frequency and sets an itimer for three seconds. The call immediately blocks on a file lock using the flock system call. The itimer should fire and interrupt the flock with an EINTR, but most of the time the itimer will fire without interrupting the flock. By offsetting the HR timer 0.5 seconds, the itimer will interrupt the flock everytime.

The test program is at github. I’m currently determining which kernels appear to demonstrate this behavior.

Update: The LKML thread here includes followups from Oleg and Roland explaining the observed behavior.

Emulating tcsh’s %c in bash

posted: August 26th, 2009

With tcsh you can set your prompt to include the %c[[0]n] escape character. The %c will print the current working directory with at most ‘n’ trailing components. This is similar to PROMPT_DIRTRIM in recent versions of bash, but IMO, better. In tcsh, the %c escape can optionally include a component to signify how many directories were actually trimmed off the beginning of the prompt.

For example, the escape %c03 if you were in a directory /etc/sysconfig/networking/profiles/default would expand to: /<2>networking/profiles/default. It also respects $HOME as a start point, so the directory $HOME/work/svn/cdc/scripts would expand to: ~/<1>svn/cdc/scripts.

Jeremie Le Hen has created an excellent emulation of this functionality for bash using only builtins. Below is an expanded version of Jeremie’s that should maintain the “~” when under $HOME and handle directory names with spaces.

Use it as: PS1="\$(traildir 3 \"\$PWD\")" to get a maximum of three trailing directories.

Preloading with multiple symbol versions

posted: August 25th, 2009

This post will describe the process of preloading an application using the LD_PRELOAD functionality of the dynamic linker and the difficulties encountered when using multiple symbol versions. It will walk-through the steps to correctly intercept symbols with multiple versions and map them to the correct underlying library versions.

What is preloading?

Preloading is a method by which a user can force the dynamic linker to load an additional dynamic shared object (DSO) when launching a particular executable. Preloading a process forces the DSO into the address space of the executing process and executes the DSO’s constructor on startup and its destructor on shutdown. Preloading a process can be down with the LD_PRELOAD environment variable on a process-by-process basis or can be enabled system-wide with the /etc/ld.so.preload configuration file.

Preloading a process allows new code to be inserted and executed within a process without recompiling or relinking the original program. When a DSO is preloaded, the dynamic linker places the preloaded DSO before the system libraries (libc, libpthread, etc.) in the symbol name search order. This means that when an application or one of the application’s dynamic libraries invokes a function within the system libraries, the function is searched for in the preloaded DSO first. This means that if the preloaded DSO exports a function that is the same name as one in the system libraries, the preloaded DSO will be called when the application makes a call to the function. This provides a powerful capability to intercept function calls made from the application.

Preloading Example

Preloading is typically used to interpose new functionality between the application and the system. This model could be used for Aspect-oriented Programming to weave new cross-cutting behavior into existing function calls at run-time without modification to the existing source code or binary. One example where this is common is in debugging utilities. A debugging DSO can be constructed that intercepts system library calls invoked by the application and log or track statistics on the frequency and types of calls made. The debugging DSO can provide valuable information that could track down coding errors or uncover performance bottlenecks.

A common use case for this facility is tracking dynamic memory operations performed by an application during run-time. Tracking application memory allocations and deallocations can locate memory leaks and identify poor memory allocation patterns. Take the following example of a simple DSO that wraps the malloc(3) call:

Compiling this wrapper and preloading it with the simple uname command results in two allocation printouts:

While this is a very simple example, a more advanced debugging library could track both allocations and deallocations and print full reports on exit.

Preloading Example #2

Let’s try another preloading example, but this time let’s override the pthread_cond_wait call from the Pthread’s library (NPTL). Once again, here is our wrapper code:

This time we’ll write a simple threaded example program to test with. The test program fires off a single thread that will wait on a condition variable for a signal. The main thread will sleep for two seconds and then signal the condition variable, awaking the blocked thread. The main thread will wait for the thread to get the signal and exit, then the main thread will exit the entire program.

Now let’s compile/link the wrapper and test program and test the normal and preloaded cases:

In the non-preloaded case our test program exited fine after two seconds. However, when the test program was preloaded with our wrapper it hung and had to be killed with a Control-C from the keyboard. Why did this hang?

What happened?

Why did the test program work fine when executed without a preload, but hang when preloaded with our wrapper? To figure this out, let’s fire up GDB and step through what happens when the test program calls pthread_cond_wait with and without our preload.

First, let’s run the program without our preload:

We setup a breakpoint on the call to pthread_cond_wait and verify that it calls pthread_cond_wait within NPTL. Now let’s try the same, but with our wrapper preloaded:

Once again we break on the call to pthread_cond_wait, but this time when we step into it it takes us to our wrapper function. This was expected, however, when we then step into the call to pthread_cond_wait via the condwait_internal pointer, it takes us to __pthread_cond_wait_2_0 instead of the pthread_cond_wait@@GLIBC_2.3.2 function without our wrapper. What is __pthread_cond_wait_2_0 and why did it takes us there? Let’s look at the source for __pthread_cond_wait_2_0:

This code is allocating another condition variable within the original condition variable structure and passing the new condition variable to the actual pthread_cond_wait function. As you can see from the backtrace, the pthread_cond_wait function is now blocking on the condition variable located at address 0x7ffff00008c0 rather than the original one at 0x600d80. However, the main thread still thinks the condition variable is located at 0x600d80.

Why two different code paths?

To understand why the preloaded version invoked a different pthread_cond_wait function than the non-preloaded one, let’s take a quick look at the output of objdump for the NPTL(http://en.wikipedia.org/wiki/Nptl) library.

Searching for the pthread_cond_wait symbol shows that there are two different versions of pthread_cond_wait in libpthread.so.0 – one located at address 0x3cf900b8f0 and the other located at 0x3cf900b240. Also, you will notice that each symbol is followed by a special “GLIBC_X.Y.Z” string. So what do these strings mean?

To understand why there are two different pthread_cond_wait’s with different string suffixes, we first must understand library versioning. In order to support changing the ABI definition of a symbol, the standard practice use to be that the SONAME of a shared library would be updated each time the ABI changed. This practice was referred to as “bumping the major number” because the major number of the library would be incremented by one. For example, the SONAME for a library named libfoobar may be changed from libfoobar.so.1 to libfoobar.so.2 to isolate an ABI change in the .1 version of the library. Programs that required the old ABI would link against the old “.1” SONAME while new programs would get linked against the new “.2” SONAME. This method was not very practical as it required a new library name each time some small ABI change was made, which led to a proliferation of library versions required for old executables. Therefore, a new method was developed that versions the individual symbols in the library instead of the entire library. This allows ABI changes to be made while maintaining the same SONAME and single library file. 2

In this example, there are two different versions of the pthread_cond_wait symbol. The older version, 2.2.5, is from the old LinuxThreads implementation, while the new version, 2.3.2, refers to the NPTL implementation. The double at sign (“@@”) before the 2.3.2 string indicates that it is the default version of the symbol. Programs that are linked against libpthread’s will be linked against the default version. This agrees with the output from our NPTL test program:

It should now be clear that the problem with our wrapper is that while the test program is linked against and requesting version 2.3.2 of pthread_cond_wait, our wrapper is mapping that request to version 2.2.5. This is because of the use of dlsym in our wrapper code to lookup the real pthread_cond_wait function. So how do we fix this?

Versioned symbol lookup

The dlsym call makes an unversioned lookup for the named symbol. By default, this unversioned symbol lookup will match the oldest symbol version in the DSO. If dlsym mapped to the latest symbol in the DSO, then if a new version of the symbol is added in the future, existing programs would map automatically to this symbol. However, if the new symbol version changed the behavior of the function, then existing programs may misbehave when they are mapped to the latest version. It is important to note that even though the “@@” symbol indicates the “default symbol”, this is only true for programs that are linking directly with the DSO – not for lookups performed with dlsym.

Luckily, we can use the dlvsym function to perform a versioned lookup for the symbol. Dlvsym call is identical to dlsym except that it accepts a third argument that defines the version of the symbol to search for. The version number is the string right of the “@” signs – so in our example, GLIBC_2.3.2 would find the latest pthread_cond_wait symbol. Let’s try updating our wrapper:

This time success:

Our wrapper is now mapping the pthread_cond_wait call to the correct library version, but are we done?

Creating versioned overrides

We started off by mapping requests for pthread_cond_wait to version 2.2.5 which caused our test program to hang when it was looking for 2.3.2. We successfully fixed this problem by mapping our internal wrapper to version 2.3.2 using dlvsym. This will work fine for any executables compiled/linked on a recent version of the OS (post-2.2.5), however, what happens when we preload our wrapper on an executable linked on a 2.2.5 (LinuxThreads) OS? In this scenario we would have the opposite problem from before – we would take requests for version 2.2.5 and map them to version 2.3.2.

To correctly support both older and newer executables we would like to intercept calls to pthread_cond_wait versions 2.2.5 and 2.3.2 and map them to the same libpthread versions, respectively. Obviously we can create two individual overrides for pthread_cond_wait in our wrapper, but how do we indicate which symbol they override and prevent symbol name collision? For that, we look to the assembler pseudo opcode “.symver” 3. With .symver we can define two different pthread_cond_wait override functions – with different names – and map each of them to a corresponding versioned override for the real pthread_cond_wait. Let’s update our wrapper code once more using “.symver”:

What we’ve done is added a second pthread_cond_wait override and suffixed it and the existing one with the version number they are overriding. Similarly, we’ve added a second call to dlvsym to lookup the 2.2.5 version of the pthread_cond_wait symbol. Finally, we have added two inline assembly calls to the “.symver” instruction that connects our overrides to the respective symbol version name they are overriding. The .symver instructions will redirect any 2.2.5 or 2.3.2 pthread_cond_wait lookups to the respective overrides. Now let’s compile this and test it out:

Wait, this now compiles but is failing to link?

The linker is complaining that it needs to know which symbol versions are exported. You specify these mappings to the linker using a version-script. The version script is passed to the linker with the –version-script option. Each .symver opcode must correspond to an entry in the version script. For further explanation of the version script, I suggest reading 2.

For our pthread_cond_wait wrapper, the corresponding version script looks like:

Now let’s try building and testing the wrapper again:

Success! We have produced a pthread_cond_wait wrapper that overrides both versions 2.2.5 and 2.3.2 as illustrated with the objdump output and which correctly invokes the underlying pthread_cond_wait functions from libpthread’s. We should now be able to repeat this pattern for any additional symbols we must override.

Concluding Remarks

  • I strongly recommend reading Ulrich Drepper’s guide to writing shared libraries 2 if you are planning on doing any advanced library work – especially writing DSO preloads.

  • The examples here were only illustrated for the x86_64 architecture. There’s no guarantee that the available symbol versions for the 32-bit and 64-bit libraries will be the same. If you are building a library which will be compiled for both 32-bit and 64-bit make sure to check the symbol versions for each underlying 32/64-bit library. This may also require different version script’s for each build too.

  • These override functions were clearly very simple. If your overrides are more complex than mine (can’t see how they wouldn’t be) then you must make sure to respect the ABI differences between the multiple versions you are overriding. There was obviously a reason for providing multiple symbol versions, so make sure you respect that difference or you will break unexpecting applications.

  • If your library requires using a symbol that you are also overriding, make sure that you internally use the symbol version marked as the default with the ”@@” string.

References

[1] Drepper, Ulrich. ELF Symbol Versioning. http://people.redhat.com/drepper/symbol-versioning

[2] Drepper, Ulrich. How to Write Shared Libraries. http://people.redhat.com/drepper/dsohowto.pdf

[3] .symver. http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/gnu-assembler/symver.html

Hiding what’s exposed in a shared library

posted: August 19th, 2009

This post will illustrate how to hide what gets exposed when you distribute a shared library. Specifically we will focus on hiding all unnecessary symbol names in your distributed shared library.

What are we talking about?

So what are we talking about when we say we’d like to hide the internals of a library? Let’s take a very simple example to illustrate what can be exposed from a shared library. For our example we have built a very small library that we name “libab” and which provides a single API call, ab_get_string as seen in the library include file:

This call will return a single string value. To see what this string actually contains we must take a look at the library source. Being the good software engineers that we are, we’ve broken this library into two separate components located in files filea.c and fileb.c:

As you can see, the ab_get_string function returns a string containing two integer values “A” and “B”. The value for “A” is obtained from an internal function in A and the value for “B” is returned from the function b_getval_super_secret in fileb.c – which later calls an internal function in B. The “internal” functions in filea.c and fileb.c are both defined as static since their scope is limited to the same file. However, b_getval_super_secret can not be declared static as it must be accessible from filea.c.

Let’s build the library and see what symbols are exposed:

So at first look all four function names are exposed in the symbol table of the library, including the static functions, intra-library functions, and the exposed API functions. Before we investigate how to hide some of these symbols, let’s explore why we would want to hide anything.

Why Hide?

For most shared libraries that are distributed, you won’t mind if some amount of visibility into the library is exposed. The library itself may be openly built from free/open source code or there may simply be no pressing desire to hide the internals of a library.

However, there are also many cases when you do want to hide the internals as much as possible. For example, if there were significant Intellectual Property contained in the library, then internal symbol names may expose some amount of that IP. You may also want to hide internal functionality to prevent unintended use. For example, if a library exposed a symbol named perform_operation_foobar a curious programmer may believe they can directly invoke this function to perform operation “foobar”, even if the function were not exposed in the header file. If the symbol had unintended consequences if invoked directly, the enterprising programmer might hassle the company support system despite having operated outside of the recommended box. Additionally, future versions of the library may remove or change the interfaces the programmer was not supposed to be depending on.

In any case, let’s see what we can do to limit the unintended exposure.

The strip command

The strip command will pull out components from an object file, including symbols and debugging segments. Let’s perform a basic strip of our library to see how much we can pull out:

As you can see, the strip command has removed the symbol table from the shared library. This removed the exposure of the “internal” static functions from the file. Clearly, any function symbol that does not need to be exposed outside of the file it’s in should be declared static. However, the dynamic symbol table still exists and contains a reference to b_getval_super_secret contained in fileb.c. Wait a minute, there are two symbol tables?

ELF files contain two symbol tables: the .symtab table and the .dynsym table (or dynamic symbol table). In simple terms, the symtab table contains all local and global symbol names primarily for the purpose of debugging capabilities. When you load GDB and inspect a backtrace, it will use this symbol table to print the names of the function symbols in the backtrace. Otherwise, without this the backtrace would only include hex address offsets in the file. The dynsym table contains the list of global symbols that are required for run-time dynamic linking and hence can not be removed. The dynamic linker will check this section to find symbols required by an executable. For a full explanation of the ELF symbol tables, please read 1.

So we have removed our internal function names from the file, however, can we do better?

Defining visibility

We can’t define the b_getval_super_secret function static because we must be able to access it from filea.c. Fortunately, we can use the symbol visibility option. The -fvisibility= option allows us to define the visibility for all exported symbols from the library. The option supports four modes, but we’ll limit our discussion to the two practical ones default and hidden. The default visibility mode exports all global symbols from the library, making them available in the dynamic symbol table. Obviously, if the -fvisibility= option is not specified, the default is default. On the other hand, the hidden option will not export any symbol that is not explicitly marked otherwise (more later).

Let’s try switching the visibility to hidden in our build:

With the -fvisibility=hidden we have been able to remove the b_getval_super_secret symbol from the dynamic symbol table. However, this removed the ab_get_string symbol from the table as well. Since the exported API symbol does not exist in the dynamic symbol table anymore, our test program fails to run with an undefined symbol error. How can we hide the internal global symbols, but still export the API symbols?

Defining per-symbol visibility

Luckily, we can also define symbol visibility at the per-symbol granularity. GCC supports the visibility attribute that we can use to annotate our functions with their appropriate visibility. Since we want the ab_get_string symbol to have global visibility, let’s define it to have default visibility. We’ll update filea.c to use the attribute:

We added a function prototype and suffixed it with the default visibility attribute. Let’s see if this works:

Success! We now have a library that exports only the single API symbol it supports. All other internal symbol names are completely stripped from the binary file.

Concluding Remarks

  • We could have just as easily annotated the b_getval_super_secret function with hidden visibility and kept the default visibility as default. However, making the default hidden you guarantee that only functions explicitly defined to have global visibility are exported. If tomorrow we added a function in fileb.c we would have to make sure to also mark it hidden.

  • You can also use export maps to define exported symbols in a map file that is parsed by the linker. See [2] for more information.

  • Reducing the number of symbols in the dynamic symbol table can also improve performance and runtime. The larger the table the more entries the dynamic linker must load and search through.

  • I suggest reading [2] for a full explanation of symbol visibility and all possible options.

Reference

[1] Bahrami, Ali. Inside ELF Symbol Tables. http://blogs.sun.com/ali/entry/inside_elf_symbol_tables

[2] Drepper, Ulrich. How to Write Shared Libraries. http://people.redhat.com/drepper/dsohowto.pdf