On this page:
Starting Implementation
Assumptions
Evaluation
Decoding Machine Code
Tips
Test Cases and Test Harness

Linking Assignment

Due: Thursday, October 25, 11:59pm

This assignment is about understanding how the ELF format supports run-time linking for shared libraries on x86_64, including the way that references to global variables and functions are implemented and represented.Less directly, this assignment is also about manipulating pointers and numbers and moving between them. That practice should help with assignments later in the semester, too.

Starting with linklab-handout.zip, your task is to implement enforce, which copies an ELF-format shared library and

The intent is that your enforce starts processing a function in closed mode, switches from closed to open when it sees open_it (or replaces with a crash if the current mode is already open), and switches from open mode to closed mode when it sees close_it (or replaces with a crash if the current mode is closed). An access to a protected_ variable is replaced with a crash if the current mode is closed, and a return is replaced with a crash if the current mode is open.

For example, given a shared library that is the compiled form of

  int other_v1 = 4;

  int other_v2 = 5;

  int protected_v3 = 3;

  int protected_v4 = 4;

  

  int f(int sel) {

    if (sel > 0) {

      return protected_v4; /* oops - access not between open & close */

    } else if (sel == 0) {

      int v;

      open_it();

      v = protected_v3;

      close_it();

      return v;

    } else {

      open_it();

      if (sel < -1) {

        close_it();

        return other_v1;

      } else

        return other_v2;  /* oops - return without close */

    }

  }

then the shared library has the following initial behaviors:

A copy of the shared library produced by enforce, however, will have the following behaviors:

You can find additional and simpler examples in "ex0.c", "ex1.c", "ex2.c", "ex3.c", and "ex4.c", which are included with the starting code.

Starting Implementation

The starting code "enforce.c" handles copying a shared-library file to a destination file based on the command-line arguments. The enforce function—which doesn’t do anything in the starting code—receives a pointer to the in-memory copy of the destination file. Your job is to fill in the implementation of enforce so that it changes the destination file.

To determine function and variable information, your enforce function will read the destination ELF file as it exists in memory. To replace certain variable accesses and function calls with a crash, your enforce function will modify the in-memory copy of the ELF file. Through the magic of mmap (which we will cover later in the semester), those in-memory changes will be reflected in the destination file.

A key step in the implementation of enforce is to decode the machine code that implements a function. Since decoding x86_64 machine code is tedious, the starting implementation includes "decode.c" and "decode.h", where "decode.c" implements a decode function. The decode function is sufficient for functions in the shared libraries that we will use to test your implementation. In addition, "decode.c" provides replace_with_crash, which can replace an instruction with one to provoke a crash through a trace/breakpoint interrupt. See Decoding Machine Code for more information on using decode and replace_with_crash.

Your submitted solution will take the form of a single C file, "enforce.c", which must be implemented in ANSI C with GNU extensions. You must not modify "decode.c" or "decode.h", since you submit only your "enforce.c" file. We’ll compile and link enforce.c in the same way as in "Makefile", so it must be self-contained other than its uses of "decode.h", "decode.c", "elf.h", limited Unix headers and libraries for open and mmap, and ANSI C headers and libraries.

Assumptions

To simplify your task, you can make several assumptions:

Evaluation

To earn points, your implementation must correctly install crashes in the form of a trace/breakpoint interrupt.

When grading, we will infer a level of completion based on your program’s behavior on the tests that are included with linklab-handout.zip. We will then scale your grade for each level based on the number of our test cases that pass.

Your enforce can print out any information that you find useful. Our tests will look only at the shared library produced by enforce.

Decoding Machine Code

The decode function provided by "decode.c" has the prototype

  void decode(instruction_t *ins, code_t *code_ptr, Elf64_Addr code_addr);

where code_ptr refers to the machine code to decode, and code_addr is the address where that code will reside at run time. (The code_addr must be provided so that decode can tell the destination of relative references in the machine code.) The type code_t is an alias for unsigned char. Use code_t* for a pointer to machine code, so that pointer arithmetic works in terms of bytes, since decode reports an instruction length in bytes.

The ins argument receives the result of decoding one instruction, so pass the address of a locally declared instruction_t to decode. The instruction_t struct type has three fields: op, length, and addr. The decode function always fills in ins->op and ins->length, and it fills in ins->addr for some values of ins->op.

The possible values for op are (as defined in "decode.h"):

For all operations, the ins->length field is set to the length of the machine-code instruction in bytes, so code_ptr + ins->length is a pointer to the next instruction, if any. The assumptions allowed for enforce means that both branches of a conditional (i.e., the target of a MAYBE_JMP_TO_ADDR_OP instruction and the immediately following instruction) can be explored by using a recursive call to an instruction-traversal procedure for one or both of the branches. Don’t try to use a loop instead of recursion to handle jumps. Branches generate a tree shape instead of a sequence shape, so recursion is the right tool (even in C, if the tree is shallow). A loop is the right choice for traversing instructions up to a branch, though.

If decode does not recognize the content at code_ptr as a machine-code instruction, it reports an error and exits. We will use test inputs where public functions conform to the constraints of decode, so you do not need to handle machine code that decode does not recognize (when used properly).

The "decode.c" library also provides replace_with_crash:

  void replace_with_crash(code_t *code_ptr, instruction_t *ins);

Pass code_ptr for an instruction that you want to replace with a trace/breakpoint interrupt. Also pass ins as filled with information about the existing instruction, so that replace_with_crash knows how many bytes to replace (filling with “no-op” instruction bytes as necessary to fully replace the original instruction).

Tips

The information about ELF that you need to complete this assignment is mostly covered by Videos: ELF. The map slides should be helpful to find your way through a file; see also a video explaining the map. More generally, you can use "/usr/include/elf.h" as a reference to find relevant structures, fields, and macros. You might also consult any number of other ELF references on the web.

Use readelf to get a human-readable form of the content of an ELF file to get an idea of what your program should find. Use objdump -d to disassemble functions to get an idea of what your program will recognize using decode and to compare your output shared library to the input shared library.

All of the information that you need from the ELF file can be found via section headers and section content, so you will not need to use program headers. You’ll need to consult the .dynsym, .dynstr, .rela.dyn, .plt, and .rela.plt sections, at least.

When working with ELF content, you have to keep track of which references are in terms of file offsets and which are in terms of (tentative) run-time addresses where the library will eventually run. When working with ELF content that is mmapped into memory (as in the starting "enforce.c"), then you have one more way of referencing things: a pointer in memory at present. Be careful to keep in mind which kind of reference you have at any time. Again, the maps should help.

Don’t confuse “symbol” with “string,” and keep in mind that they are referenced in different ways. Symbols are referenced by an index that corresponds to the symbol’s position in an array of symbols. Strings are referenced by an offset in bytes within the relevant string section. Every symbol has a string for its name.

Take small steps. Start out by printing the symbol index for every function that is provided by the shared library. Then, print the name instead of the symbol index. Then, print the address where each function’s implementation is found, and so on.

Avoid parsing ELF information into your own set of data structures. An ELF file in memory can be viewed as a data structure already, based on the types that "elf.h" defines. Following different kinds of references is not always trivial, but it’s not difficult, and you can write little functions to make access more convenient and readable. The task and examples are small enough that there’s no need for extra tables or data structures to speed up searches.

As a lower bound, a complete solution can fit in about 250 lines of C code, including the 120 lines in the provided in the starting "enforce.c". Depending on whitespace (fine), comments (good), error checking (commendable), and duplicated code instead abstracting into a function (bad), many solutions will be the range of 300-400 lines.

Test Cases and Test Harness

The archive linklab-handout.zip provides a "Makefile" where the default target builds

For example, after running make with the initial files, you can check that calling "ex0.so"’s always_ok with the argument 1 produces the result 1 and never_ok with the argument 1 produces the result 3:

  $ ./call ex0.so always_ok 1

  1

  $ ./call ex0.so never_ok 1

  3

After you have completed enforce for enforcing paired open_it and close_it, then

  $ ./enforce ex0.so dest.so

  $ ./call dest.so always_ok 1

  1

  $ ./call dest.so never_ok 1

  Trace/BPT trap

Alternatively, you can use objdump -d dest.so and compare to objdump -d ex0.so to check whether your enforce has replaced the return in never_ok with an int3 instruction that triggers a trace/breakpoint interrupt.

The test makefile target runs your enforce on the "exn.so" shared libraries (overwriting "dest.so" for each attempt) and checks that the result behaves as predicted at the top of the "exn.c" source. You can use The make test80 to run only tests for the 80-point level of completeness, which is "ex0.so". You can make test90 to run only tests for the 90-point level of completeness, which is "ex0.so" and "ex1.so".

For grading, we will run your program on additional shared-library files. We may also compile your program with different optimization or debugging options; as always, your program must build with gcc on CADE machines with no language-adjusting command-line flags.