What is meant with “memory alignment” ?

The CPU (memory controller) access the memory by a single word in time.

The next examples are the simplified illustration of the background problem:

Say for example that the data fits and is aligned within the single word, like in the following image:

In this case the data can be fetched by single read operation.

In the next example the data  is still 1 Word in size but its offset is not properly aligned with the word boundaries:

Now, to fetch the data 2 Words and an extra computation has to be done in order to get the data.

A memory access is said to be aligned, when the memory address is a multiple of the size of the data being accessed. For ex.  for aligned accessing an int32_t (which is 4 bytes in size) the accessed address has to be a multiple of 4. Otherwise the memory access is said to be misaligned.

The structure is aligned if it’s every data member is properly aligned according to it’s type.

The padding can be used explicitly or implicitly (generated by the compiler) to “fix” alignment.

The following example illustrates how the member ordering within the sample structures impacts the member alignment and padding and the final structures size.
The structures MyStruct1 and MyStruct2 have the same members but in different order.

#include <cstdint>
#include <iostream>

struct MyStruct1 {
  char c1;        // starts at offset 0
  char c2;        // starts at offset 1
  int16_t x1;     // start at offset 2
  int32_t x2;     // starts at offset 4
};

struct MyStruct2 {
  char c1;			// starts at offset 0
  // char Padding[1];
  int16_t x1;			// start at offset 2
  char c2;			// starts at offset 4
  // char Padding[3]
  int32_t x2;			// starts at offset 8
};


int main()
{
  std::cout << "sizeof(MyStruct1) = " << sizeof(MyStruct1) << " bytes" << std::endl;
  std::cout << "sizeof(MyStruct2) = " << sizeof(MyStruct2) << " bytes" << std::endl;
}

The output:

sizeof(MyStruct1) = 8 bytes
sizeof(MyStruct2) = 12 bytes

Explanation:
The data types in this example have following sizes (in bytes):

sizeof(char) = 1
sizeof(int16_t) = 2
sizeof(int32_t) = 4

The MyStruct1 members are ordered in such a way, that the natural alignment is hold.
Each data member in MyStruct1 starts at offset which is a multiple of its type size. 
Because sizeof(char) = 1, the char data members are always aligned.
The sizeof(int16_t) is 2 bytes and x1 Member (of type int16_t) starts at offset 2 which is a multiple of 2, so it’s alignment is ok.
The sizeof(int32_t) is 4 bytes and x2 member (of type int32_t) starts at offset 4 which is a multiple of 4, so it’s alignment is also ok.

In MyStruct2 we have other situation.
Without alignment the member x1 (of type int16_t) would start at offset 1 (directly after c1 member). But because offset  of 1 bytes is not  a multiple of x1’s size (=2) this would be inefficient. So the compiler adds a padding of 1 byte after c1 member. Now the x1 starts at offset 2 so everything is fine.
The same is the situation with the x2 member. Without a padding it would start at an offset of 5 bytes but because the sizeof(int32_t) is 4 bytes, the compiler adds the 3 bytes padding after x1, so that x2 member starts at offset of 8 bytes.
So at the end the MyStruct2 will have a size increment of 4 bytes total.

__attribute__((packed))

In opposite to padding it is also possible to tell the compiler to  “pack” the structure members, so that there is no extra padding added. For example the gcc compiler provides __attribute__ ((packed)) option.  The downside is that there is an extra CPU processing required to read/write the packed data.

Look at the following two examples:

In the first example we define MyStruct without any extra compiler attributes.
The MyStruct has 2 data members:
* one char (default size 1 byte) and
* one short (default size 2 bytes)
The func takes an argument of type MyStruct by value, increases its both members by one and copy returns MyStruct.

struct MyStruct {
    char c;             // offset 0
    // char Padding[1]  // offset 1
    short n;            // offset 2
};

 MyStruct func(MyStruct s) {
     s.c++;
     s.n++;
     return s;
}

The gcc compiler generates the following assembly code (with an -m32 compiler option for 32 bit compilation) for the function func:

func(MyStruct):
 1  push ebp
 2  mov ebp, esp
 3  movzx eax, BYTE PTR [ebp+12]
 4  add eax, 1
 5  mov BYTE PTR [ebp+12], al
 6  movzx eax, WORD PTR [ebp+14]
 7  add eax, 1
 8  mov WORD PTR [ebp+14], ax
 9  mov eax, DWORD PTR [ebp+8]
 10 mov edx, DWORD PTR [ebp+12]
 11 mov DWORD PTR [eax], edx
 12 mov eax, DWORD PTR [ebp+8]
 13 pop ebp
 14 ret 4

From the assembly code, we can see:

So the offset between the char and short member is 2 bytes, so the compiler has added the padding of 1 byte between them, like expected.

In the second example we are using the gcc specific option __attribute__((packed)) to “pack” the structure data members without padding.

struct MyStruct {
    char c;             // offset 0
    // no padding here because of __attribute__ ((packed)) option
    short n;            // offset 1
} __attribute__ ((packed));

 MyStruct func(MyStruct s) {
     s.c++;
     s.n++;
     return s;
}

If we now look at the generated assembly for the function func, we can see from lines 3 and 6 that the offset between char and short member is 1 byte, or rather: the short member comes right after the char member without any padding.

func(MyStruct):
 1  push ebp
 2  mov ebp, esp
 3  movzx eax, BYTE PTR [ebp+12]
 4  add eax, 1
 5  mov BYTE PTR [ebp+12], al
 6  movzx eax, WORD PTR [ebp+13]
 7  add eax, 1
 8  mov WORD PTR [ebp+13], ax
 9  mov eax, DWORD PTR [ebp+8]
 10 movzx edx, WORD PTR [ebp+12]
 11 mov WORD PTR [eax], dx
 12 movzx edx, BYTE PTR [ebp+14]
 13 mov BYTE PTR [eax+2], dl
 14 mov eax, DWORD PTR [ebp+8]
 15 pop ebp
 16 ret 4

Why are memory aligning and padding important ?

One side is the increasing memory consumption in a time/memory critical applications – like for ex. long running services, server applications, etc. Imagine of having a large array of such structures – in the example above the total memory cost holding of such array would be ca. +50% compared with the first variant.

The second point is: some architectures can raise bus errors if the accessed memory is not properly aligned.  Such are for example the CPU’s with reduced and optimized instruction set (for example SPARC, ARM) which are used for example in mobile devices (so increased speed and lower energy consumption). So for writing highly platform portable software the care has to be taken of such limitations.

What tools do we have to enforce alignment ?

To check the alignment of some type  there is (since C++11) the alignof(type) operator.  It returns the value of type std::size_t which is alignment for every instance of some type type.

To change the alignment of some type there is alignas specifier (since C++11).
There are also some compiler specific options to change alignment such for ex. __attributes__  ((aligned (<new alignment>))) for GCC.

Some examples:

#include <iostream>

// A)
// default alignment is 4 bytes
struct S0 {
    float f[3];
};

// B)
// Increase alignment from 4 to 8 bytes by using
// GCC-specific __attributes__ compiler option
struct  S1 {
    float f[3];
} __attribute__ (( aligned(8) ));

// C)
// Same ex B) but using C++11 alignas specifier
struct alignas(8) S2 {
    float f[3];
};


int main()
{
    std::cout << "alignof(S0) = " << alignof(S0) << std::endl
              << "alignof(S1) = " << alignof(S1) << std::endl
              << "alignof(S2) = " << alignof(S2) << std::endl;
}

Output (compiled with g++):

alignof(S0) = 4
alignof(S1) = 8
alignof(S2) = 8

 

Leave a Reply

Your email address will not be published. Required fields are marked *