Demistifying AMD64 ABI with some examples, part I


Sources:

Introduction

The AMD64 architecture is an extension of the x86 architecture and has previously be called x86-64. AMD64 ABI represents the “long” mode.

The specification defines following data representations for the objects:

termobject size
byte8 bits
twobyte16 bits
fourbyte32 bits
eightbyte64 bits
sixteenbyte128 bits

The following figure shows the correspondence between ISO C’s scalar types and the processor types:

Function calling sequence

The AMD64 architecture provides 16 general purpose 64-bit registers. In addition
the architecture provides 16 SSE registers, each 128 bits wide and 8 x87 floating
point registers, each 80 bits wide. Each of the x87 floating point registers may be
referred to in MMX/3DNow! mode as a 64-bit register. All of these registers are
global to all procedures active for a given thread.

Registers rbp, rbx and
r12 through r15 “belong” to the calling function and the called function is
required to preserve their values. In other words, a called function must preserve
these registers’ values for its caller. Remaining registers “belong” to the called
function.5
If a calling function wants to preserve such a register value across a
function call, it must save the value in its local stack frame

In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Following figure shows the stack organization:

The 128-byte area beyond the location pointed to by %rsp is considered to
be reserved and shall not be modified by signal or interrupt handlers.8 Therefore,
functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

After the argument values have been computed, they are placed either in registers or pushed on the stack.

The size of each argument gets rounded up to eightbytes. Therefore the stack will always be eightbyte aligned.

The classification of arguments is described in the psABI-x86_64.

Let’s just dive in and examine the function calling sequence according to the AMD64 ABI. As an example let’s take the following function:

int add_int(int a, int b) {
    int r = a + b;
    return r;
}

Regarding passing the arguments of class INTEGER the spec says following:

If the class is INTEGER, the next available register of the sequence %rdi,
%rsi, %rdx, %rcx, %r8 and %r9 is used

The function add takes 2 arguments of type int and returns the sum of the same type. Let’s examine the generated assembly code:

add_int(int, int):
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edi, -4(%rbp)
        movl    %esi, -8(%rbp)
        movl    -4(%rbp), %eax
        addl    -8(%rbp), %eax
        movl    %eax, -12(%rbp)
        movl    -12(%rbp), %eax
        popq    %rbp
        retq

rbp is the stack frame base pointer.

According to the AMD64 ABI spec these arguments correspond to the classification type INTEGER.

Line 4: The parameter a is passed via the rdi register. Because the sizeof(int) = 4 only the lower half of the rdi is used (edi). The parameter is copied into the memory location within the stack frame at rbp-4.

Line 5: Similarly, the parameter b is passed via the rsi register by using only the lower part esi. The parameter is copied into the memory location within the stack frame at rbp-8.

The end of the input argument area shall be aligned on a 16 (32, if __m256 is
passed on stack) byte boundary. In other words, the value (%rsp + 8) is always
a multiple of 16 (32) when control is transferred to the function entry point. The
stack pointer, %rsp, always points to the end of the latest allocated stack frame .

The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available. E.g. this can be achieved by passing compiler argument -fomit-frame-pointer (clang, gcc). The resulting assembly becomes in that case:

add_int(int, int):
        movl    %edi, -4(%rsp)
        movl    %esi, -8(%rsp)
        movl    -4(%rsp), %eax
        addl    -8(%rsp), %eax
        movl    %eax, -12(%rsp)
        movl    -12(%rsp), %eax
        retq

Let’s know examine the same example, but with arguments of type float. According to the AMD64 ABI spec we expect to see the arguments passed via SSE registers (frame pointer omitted):

float add_float(float a, float b) {
    float r = a + b;
    return r;
}
add_float(float, float):
        movss   %xmm0, -4(%rsp)
        movss   %xmm1, -8(%rsp)
        movss   -4(%rsp), %xmm0
        addss   -8(%rsp), %xmm0
        movss   %xmm0, -12(%rsp)
        movss   -12(%rsp), %xmm0
        retq

According to the spec, these arguents correspond to the classification type SSE.

Let’s look at some more interesting cases, e.g. C++ objects with non-trivial copy constructor or a non-trivial destructor. According to the spec:

If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor, it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class INTEGER).

An object with either a non-trivial copy constructor or a non-trivial destructor cannot be passed by value because such objects must have well defined addresses. Similar issues apply when returning an object from a function.

As an exmaple, let’s take the following C++ source:

struct Foo {
    int x;
    ~Foo() {} // non-trivial destructor
};

Foo add_foo_nontrivial(Foo a, Foo b) {
    Foo r;
    r.x = a.x + b.x;
    return r;
}

and look at the generated assembly (optimized, frame pointer omitted):

add_foo_nontrivial(Foo, Foo):
        movq    %rdi, %rax
        movl    (%rdx), %ecx
        addl    (%rsi), %ecx
        movl    %ecx, (%rdi)
        retq

Now let’s look at the same example but by having the trivial destructor:

add_foo(Foo, Foo):
        movq    %rdi, %rax
        movl    (%rdx), %ecx
        addl    (%rsi), %ecx
        movl    %ecx, (%rdi)
        retq
add_foo_trivial(Foo, Foo):
        leal    (%rdi,%rsi), %eax
        retq

We conclude:

  • in the case of passing objects of classes with non-trivial destructor (or non-trivial copy constructor) the (invisible) object reference is passed via rdi, the first parameter via rsi, the second parameter via rdx and so on
  • in the case of trivial destructor (or trivial copy constructor) no object reference is passed.

There is a nice blog about this case in https://quuxplusone.github.io/blog/2018/05/02/trivial-abi-101/.

The summary of passing function arguments:


Leave a Reply

Your email address will not be published. Required fields are marked *