> In C99 it is allowed to declare the last member of a structure as an array with no number of elements specified. The size of the struct will then be as if the last member did not exist.
There's one gotcha here, which is that the alignment requirements of the flexible array member can change the size of the struct. For example the following fails on x86_64-linux-gnu:
1. The article points out you should compare structs field by field, but it doesn't explain why memcmp wouldn't work. The reason is that the padding between the fields might not necessarily be zeroed in all cases. Field by field comparison is resilient to this.
2. The article proposes this for dynamic allocation:
I think it's better to use the variable name inside sizeof, so like this:
struct Vector2D *vec = malloc(sizeof(*vec));
This helps you in the case where you change the type of the variable to different kind of struct. If you change the variable name, you're probably doing a find/replace anyway, and it will almost certainly fail to compile even if you miss it.
But if you change the type name, you are probably also doing a search and replace.
In the end, the only justification for the latter rather than the former is that the pointer name is often a short name like vec, and so sizeof *vec is shorter than the alternative.
That has advantages.
If there is a typo in a short name, you're more likely to see it.
This is a fair point. If we have *foo = malloc(sizeof *foo), we can search and replace for all occurrences of foo in that scope and replace them; that's always something which potentially makes sense.
However with *foo = malloc(sizeof (type)), it doesn't always make sense to be searching and replacing all occurrences of type, even if restricted to that scope. And, when type is a built-in type like char *, we cannot* be replacing all occurrences of it, even in that same scope where foo is active.
Types are embroiled in declaring multiple entities in the program, whereas a variable name declares exactly one thing; all other occurrences of the variable are references to that thing. Sometimes it makes sense to replace some of those occurences but not others, but that's neither here nor there: in *foo = malloc(sizeof *foo) you would never replace one foo without the other, even if some references to foo elsewhere in scope remain unedited for good reasons.
Basically you'd have to do something extremely absent-minded or silly to wreck *foo = malloc(sizeof *foo), whereas accidentally wrecking the correctness of *foo = malloc(sizeof (type)) in some way can happen under only a small lapse of mindfulness. Since the two sides refer to different identifiers, it makes sense to edit them separately: if you're renaming foo, you don't touch (type); if you're changing (type) then you don't touch foo (but have to update its declaration elsewhere). There is no obvious, trivial, easy-to-maintain consistency maintain that is localized just in that assignment; things go wrong because of material in separate places elsewhere.
I’ve been wondering about (1) recently—is there a way to memset the entire stack frame at the start of a function such that memcmp works as expected? Also, what are the performance implications of comparing a padded struct member-by-member vs a single big memcmp? Is member-by-member faster because you’re comparing less in total, or is memcmp faster because it’s one big contiguous compare? Or is it more complicated?
Regarding (2), sizeof(*x) doesn’t actually dereference x, right? The dereference isn’t evaluated—it’s all calculated at compile time, right?
> is there a way to memset the entire stack frame at the start of a function such that memcmp works as expected?
I'd argue no simply because the value of the padding bytes is always unspecified. A compiler that sees such a `memset` is (IMO) perfectly free to not zero known padding bytes since it knows their value should not matter to the program. Compilers might not currently do that but you can already see this kind of behavior in other situations - C compilers will happily throw out `memset` calls if it knows the result won't be used.
But also beyond that, it probably doesn't matter anyway because there's no way to _use_ the `struct` which won't leave the padding bytes with unspecified values. `memset` might reliably zero the padding bytes for you, but writing to the struct will randomly screw up the padding bytes depending on what the compiler feels is the best way to do things, so then you're back at `memcmp` no longer working. The only real way to make `memcmp` work is to ensure you have no padding bytes to begin with.
Compiles are free to not zero padding bytes if a struct is passed to memset but there are some situations where padding is not unspecified, for example if you do partial initialization of it.
(1) If the pointers you pass to memcmp are know to the compiler, then memcmp will be treated as an intrinsic, and generally the compiler generates the optimal set of instructions for a given struct.
(2) Correct. The expression is only evaluated at compile time.
You can have memset/memcmp work for individual structs by setting the padding yourself, and to make sure you do it right, on GCC/clang you can use -Wpadded. It's probably best to do that on a case-by-case basis, unless you're prepared to deal with a lot of compiler warnings/errors though.
> But what if we want a struct to feel like a real C type?
Remove this; it's already a real type. You could go over the two namespaces and why you need to type 'struct' otherwise, but I wouldn't present a false dichotomy of real and "unreal" types.
> Declaring a Struct as a New Type
It's not really a new type. It's a type alias, and the two names can be used interchangeably. I would just call it a type alias.
> If we declare a structure variable without initializing it, like any other variable in C, it will be uninitialized at first and may contain random values.
I also see no mention of = {} or = {0} (I forget which is Standard C and which is C++). Since you already talk about "random values", you might as well explain how to 0-initialize a struct.
Yes, it's a bit frustrating, especially for headers with inline/macro code. And for headers, requiring C23 doesn't seem sensible for quite some time. I define a macro:
> The only good reason to use packed structures is when you need to map some memory (e.g. hardware registers exposed to memory) bit by bit to a structure.
Another common reason is when two CPUs of different architecture need to access the same structure in memory. E.g. you have a RiscV and an Arm64 processor in the same system, sharing memory. Or you read structured binary data from disk and need to specify an exact layout.
The word opinionated was coined and adopted in English to describe a certain attitude. It has functioned fine for (probably) centuries (who knows, and I can't be bothered to research too far). Then came the age of IT and blow me, are we not opinionated to the point of ridiculousness.
A sentence construction along the lines of "The only good reason to" [do x] "is" [y], seems to invite a negation, quite aggressively. You might as well stand in the rain, wearing steel armour, and holding a long copper rod ... and shout "All Gods are bastards" (as a Knight of the Realm, sadly deceased, from hereabouts suggested might be an unwise life shortening decision).
I'm pretty sure packed structures have other uses.
Good call. Such statement implies that there is no other reason, as obvious as that might sound. Just say "One reason to" to avoid implying that's the only one.
> I'm pretty sure packed structures have other uses.
Unless you're using weird C compilers, you shouldn't need to pack structures in order to have RiscV and Arm64 access the same structure. Packing structures will be detrimental in a situation where any of the processors have alignment requirements for accessing words. Extra instructions may be required to access that four byte member at offset 13, by using two accesses and some shifting and masking. Atomicity has gone out the window.
If we look at, say, GCC, the structure layout rules are very consistent across the targets. They are not absolutely the same, but consistent enough to work with.
Packing won't take care of byte order, though it's not super common to have a system where hosts of opposite byte order are accessing the same buses, and sharing memory. (I've worked on systems where a DPS had "weird order", like opposite endian from the host, but in 16 bit units. Not DCBA but CDBA or something like that.)
You may be tempted to use a packed structure for conforming to some layout in a network packet or on disk, but it leaves the code nonportable and slow. In between the time the packed structure is read from the file and written again, it might be accessed many times. All those times may be slowed down due to the packing. You may be better off writing proper serializing and deserializing. That can also help if there are multiple versions of the format, or some optional fields and other crap. You may be able to map all the variants onto a single in-memory structure, dealing with the differences only in the serializing and deserializing routines.
I'm skeptical of how common it is to need packing in order to map hardware registers bit-by-bit. Hardware registers tend to be very regularly sized and spaced. If you're literally mapping bit-by-bit, the language extension that is most beneficial is the ability to declare a bitfield to be of any integer type. E.g. if registers are consecutive 32 bit words:
// Register AREG (32 bits)
uint32_t AREG_foo_field : 13;
uint32_t AREG_bar_bit : 1;
uint32_t : 0; // done with this cell: go to next
// Register BREG
uint32_t BREG_bozo_bit : 1;
Bitfield allocation is endian-dependent. In the end, for utmost portability, you want to access the 32 bit word as a word, and do the shifting and masking, where you can add a byte swap, if necessary.
> Another common reason is when two CPUs of different architecture need to access the same structure in memory.
That sounds like mapping memory and something you could avoid with the right compiler flags.
> Or you read structured binary data from disk and need to specify an exact layout.
If you need it stay in the same form it has on the disk, I think it's fair to call that a concern in the category of "map some memory". Edit: And other spots in the text explicitly include file formats.
Basically I think you're giving examples rather than exceptions.
All of these sound weird to me—most non-stupid (hello 802.2) protocols and hardware are going to have natural-aligned structure fields, so basically any mainstream (8-bit-byte, two’s complement, etc.) ABI is going to lay them out the same way, packed or not.
As for RV64 and Arm64, the layout rules for same-size scalar types in their common ABIs are outright identical aren’t they?
We’re (most of us) a long way away from the time where each DOS compiler had its own opinions on whether long double should be 8-, 16-, 32-, or 64-bit aligned and 80 or 128 bits long.
> All of these sound weird to me—most non-stupid (hello 802.2) protocols and hardware are going to have natural-aligned structure fields, so basically any mainstream (8-bit-byte, two’s complement, etc.) ABI is going to lay them out the same way, packed or not.
In the long ago year of 2015 I worked on a project where the same binary packet was:
1. Generated by an 8 bit micro controller
2. Consumed by a 32bit Cortex M3
3. Passed onto iPhones, Androids, Windows Phones, and Windows PCs running ObjC, Java, C#, and C++ respectively
4. Uploaded to a cloud provider
The phrase "natural aligned" has no meaning in that context.
> The phrase "natural aligned" has no meaning in that context.
The phrase “naturally aligned” as I’m accustomed to seeing it used refers to the alignment of a power-of-two-sized type (usually a scalar one) being equal to its size. Unless you’re working with, say, 18-bit or 24-bit integers (that do exist in obscure places), it does have a meaning, and unless you’re using non-eight-bit bytes that meaning is fairly universal (and if you’re not, your I/O is probably screwed up in hard-to-predict ways[1]).
At least for your items 2, 3, and 4—excluding Java and C# which are not relevant to TFA about C and are likely to use manual packing code—you have, let’s see,
- The bytes are eight bits wide, and ASCII byte strings have their usual meaning;
- The integer types are wraparound unsigned and two’s complement signed least-endian with no padding bits or trap representations and come in 8-bit, 16-bit, 32-bit, and 64-bit sizes and identical alignments;
- The floating-point types are IEEE 754 single and double precision floats, little endian, respectively 32 bits and 64 bits in size and of identical alignment, though you should probably avoid relying on subnormals or the exact choice of NaNs;
- Structures and unions have the alignment requirement of their most strictly aligned member;
- The members of a structure are laid out at increasing offsets, with each member starting at the earliest offset permitted by its alignment (while the members of a union all start at offset zero as the standard requires);
- The structure or union is then padded at the end so that its alignment divides its size.
If you avoid extended precision and SIMD types, the default ABI settings should get you completely compatible layouts here. (On an earlier ARM you might’ve run into mixed-endian floats, but not on any Cortex flavour.) Even bitfields would be entirely fine, except Microsoft bloody Windows had to be stupid there.
Honestly the only potential problem is 1, an unspecified 8-bit controller, and that only because the implicit integer promotions of standard C make getting decent performance out of those a bit of a crapshoot, leading to noncompliant hacks like 8-bit ints or 48-bit long longs. Still, if the usual complement of 8/16/32/64-bit integers is available, the worst you’re likely to have to do is spell out any structure padding explicitly.
I do my current work (embedded) on an architecture with the following properties:
- 8-bit bytes
- 16-bit aligned accesses to 32-bit types
- 32-bit aligned accesses to 64-bit types.
- Struct alignment depends on the size of the struct (32-bit aligned for >= 64-bit structs)
It's a pretty common architecture in the automotive industry, though probably would be considered esoteric for other applications.
This is not the first platform I've encountered with "unnatural" alignment rules in the embedded space, and I'm sure it won't be the last. (The extra packing this allows is actually quite handy.)
I think we agree that this makes sense on some metaphysical level. The problem is that there are definitely platforms where the normal alignment isn't what you describe above. And there isn't to my knowledge a switch in GCC to force it to follow these rules on any given platform. There isn't __attribute__((natural_alignment)). But there is __attribute__((packed)).
Since C11 there is _Alignas(sizeof T), forcing one of the proposed meanings for alignment, and _Alignof(T), which queries actual (i.e. natural, per another meaning) alignment. But, yeah, the argument upthread seems more about the implicit meaning of natural than anything else.
On that note, something that caught me off guard once is that C11 _Alignof and GCC __alignof__ can differ: for example in 32-bit x86 __alignof__(double) == 8 but _Alignof(double) == 4; however __alignof__(struct { double d; }) == 4. Apparently __alignof__ gives the preferred alignment whereas _Alignof gives the alignment required by ABI.
> the default ABI settings should get you completely compatible layouts here
That's not true! You must not assume that the alignment always equals the size of a type. For example, the SysV i386 ABI uses 32-bit alignment for 64-bit types (double, int64_t). The Microsoft x86 ABI, however, uses 64-bit alignment, as do all 64-bit ABIs (See https://stackoverflow.com/a/11110283.)
If you want to share structs directly between different machines, you should use appropriate struct packing directives - unless you really know what you are doing.
What's worse, MSVC's 32-bit x86 ABI reports an 8-byte alignment requirement (via __alignof) for 64-bit integer types, and its struct layout algorithm uses that alignment to determine padding, but those integers and structs are only aligned to 4 bytes when allocated on the stack! This has caused issues with Rust code trying to link with MSVC code [0], since Rust's standard library documentation asserts that properly aligned pointers have addresses that are always a multiple of the alignment used for struct layout.
This was just a placeholder, perhaps a bad example. I program a proprietary CPU architecture which does not require alignment. And for which the compiler naturally prefers to pack structs. Getting it to mimick Arm style struct padding is much harder and error prone than just having the Arm pack everything.
Maybe you are right and we are heading for a One True Struct Layout in the future. Today I think it is still too scary to pass the same unpacked struct declaration to various compiler archs and hope they come up with the same interpretation.
I don’t think RV32 actually differs re alignment or struct layout, it’s just that with RV64 and Arm64 even the non-fixed-width names for the integer types are the same (LP64) except for Windows-on-ARM.
Some targets also don't use natural alignment. AIX, for example, uses 4-byte alignment for doubles that aren't the first field in the struct. GCC has the -malign-natural and -malign-power options for dealing with this.
It is common for small CPUs to be embedded inside of various hardware devices. For instance, your GPU might have one or more control CPUs embedded inside. These CPUs would have either direct or at least DMA access to main memory. If you have heard about "firmware" being necessary for a hardware device to function, that "firmware" is really just software that runs on one of these auxiliary CPUs.
Suppose you want to send a command from the main CPU to this subprocessor. For efficiency and simplicity, that command might be defined as a C struct, in a common header. In that case it can be good to use packed alignment so you don't have to worry about possible layout differences between CPUs.
1. you mention that passing by value may be faster. The Vector2D function would be a good example for this, because the floats may be passed through registers instead of memory in certain ABIs. It’s a common mistake in linear algebra libraries. It also creates pure functions which leads to nicer to use APIs.
2. memset is strictly speaking not correct due to padding and null pointers not necessarily being 0. The newer = {} syntax solves this.
I would encourage the placement of attributes before the objects they apply to. This has been allowed by GCC since the standardized attribute syntax was added to C++. It is a more natural fit for code that will eventually be upgraded to C23 attribute syntax in the future and is more prominently visible to a reader. It also avoids the sometimes awkward GCC rules for postfix attributes.
> One might assume that the compiler puts all its members directly behind each other in memory with no gaps in between and that the size of each structure is exactly the same as the sum of the sizes of all of its members. Such a structure is called a packed structure:
> This is NOT how a compiler usually lays out a structure in memory.
Just explain how it works instead of assuming that the reader is making an (incorrect) assumption. I think you're caught making that assumption because you've introduced the topics in a non-optimal order.
I would split up "Element Order and Addressing" and remove the picture there because it's misleading. You can talk about addressing without explaining how the struct is laid out in memory. You don't even need structs to explain addressing ("& takes the address of an object"). Then, once you've introduced basic addressing and struct alignment/padding, you could introduce addressing of struct fields. But I suspect it won't have much value at that point because the logistics will be a natural consequence of the earlier topics.
also put function pointers inside struct for simple object-oriented-programming in C.
flexible array is handy, you do one malloc for all, but pointers inside struct is more 'flexible', for example you can put a 'void *' and cast it to various data types. for flexible array, the data types must be chosen first.
the cache subsystem might be smart enough to cache the first line, e.g. the size of the array etc, then pull the rest of array content from heap and cache the array as needed, should be fine to me, yes two cache misses instead of one at the beginning.
I support a C codebase that, over the years, has been supported on 4 different processor architectures (Alpha, SPARC, x86, Itanium).
It's rather simple to maintain structures with the exact same size and alignment on all of the platforms. Use dummy variables, group all structure members by type, and put character fields last.
Put a comment to the right of every member that contains the expected byte offset of that member. At the end of the structure, #define the total expected structure size in bytes, because as part of your test suite you have test cases that assert structure sizes (and member alignments). It's a little extra work at the start, but once you have that habit it's really simple and quick.
typedef struct example {
int mbr1; // 0
int dummy; // 4 - dummy variable, unused
double mbr2; // 8
double mbr3; // 16
} example;
#define S_EXAMPLE_BYTES 24
> The only good reason to use packed structures is when you need to map some memory (e.g. hardware registers exposed to memory) bit by bit to a structure.
Although unaligned access isn't fast especially if it's not directly hardware supported, it's still much faster than all the options where there just isn't enough memory.
It's... unfortunate that C doesn't standardize any of this.
> The great thing about [passing structs by copy] is that it allows us to omit to allocate structs with malloc on the heap. Instead, we could create a struct on the stack and pass it around by copying it. This opens the door to new approaches for writing safer C code.
You don’t have to heap-allocate structs to pass by pointer; you can pass a pointer to a struct living on the stack. Even if you are talking about returning a struct, you can just take a pointer to memory allocated in some previous stack frame and mutate it—this avoids the allocation and thus preserves safety and performance. But yes, being able to pass structs by copy is nice too.
> If we declare a structure variable without initializing it, like any other variable in C, it will be uninitialized at first and may contain random values.
Really, it's important to stress that as far as you are concerned, uninitialized things don't contain values, they only contain undefined behavior.
> But in complex programs, a structure can easily have 20 members and more.
Mostly in poorly designed programs.
> The only reason not to use it is if you are forced to work with a C89 compiler and can’t upgrade.
Or if you have a Vector2D and don't need to be constantly reminded what comes after x.
> But what if we want a struct to feel like a real C type?
It's already a real type. The reason people typedef structs is to save typing 7 characters.
I recommend using sizeof *vec here, instead of sizeof (struct Vector2D), it's much harder to screw up and mix up types in my experience.
> The memory is uninitialized so it is a good idea to initialize it to zero bytes. We can do this by calling the memset function with the pointer to our new struct, the initialization value 0, and the size of our structure:
Don't do this by calling memset. There's no guarantee that memsetting a pointer, float or double to all bytes zero will actually produce things which equal zero. On implementations where null pointers are not all bits zero (rare but they do exist) you will not get a null pointer.
> The great thing about this is that it allows us to omit to allocate structs with malloc on the heap.
Nothing stops you from using pointers to struct typed objects with automatic storage duration when calling functions:
struct Vector2D vec;
foo(&vec);
> This macro is useful to check if the compiler added any padding in between the members of the structure.
I would say this is hardly a use, more of a curiosity, really you shouldn't write code which relies on the presence or absence or width of padding, you can't even reliably store information in padding (the value is free to change due to you writing to an unrelated member). The only situation where this makes any sense is when using packed structs, and those almost never make sense except for some very specific circumstances.
> It can also be used to get the memory address of the structure if you only have the address of one of its members and know what type of struct it is a member of.
I haven't been able to gather agreement on whether this is something actually allowed by the C standard. There are two main schools of thought about this and one allows it, the other prohibits it, it's an extreme example of a total grey area in the C standard.
> This is used in some advanced code e.g. the OOP implementation of the Linux Kernel.
I think calling what Linux does OOP is misleading. The kernel just has lots of vtables, I disagree that OOP is just about vtables (or the effect they give).
> The most general reason is that one of the design goals of C was to be a language that can be implemented on as many hardware platforms as possible. Therefore the standard needs to be flexible to allow compilers to adapt the actual implementation to the specialties and quirks of their hardware respective platforms.
I mean, it's actually more about ABI than hardware. You can have two ABIs with different padding requirements on the same hardware platform. It just so happens that ABIs are themselves usually designed with hardware in mind.
> And here we already see the solution. A structure has to get enough trailing padding to align with its biggest data type.
While this is not a bad way of thinking about it, again, really, it's important to stress that while developing code in C, you should NOT be relying on this for anything other than performance optimisations for a particular platform.
> (e.g. because you want to map a file format or some hardware registers exposed in memory)
While using struct packing to deal with hardware registers is forgivable (although, rare, given that hardware registers will often likely be aligned the same as in the ABI), you really shouldn't use it for any file format you want to be portable outside a single machine. With modern compilers there's effectively no penalty to doing this properly (i.e. defining functions like uint_least32_t read32le(void *p) which read byte by byte and de-serialize the number using shifts and ORs). Yes I have tested this. Not only will your code not be cryptic and broken the moment you find yourself on a big endian machine, it also won't be unnecessarily portable for no good reason.
> Thankfully, C supports so-called bitfields.
You make it sound like an array/struct of bools or a bitfield are the only two options.
> The memory is uninitialized so it is a good idea to initialize it to zero bytes. We can do this by calling the memset function with the pointer to our new struct, the initialization value 0, and the size of our structure:
Or just calloc(), which also takes a count and an element size, convenient when you're allocating arrays.
I thought the same thing. Why is calloc not widely used?
That said, when initializing structs `= { 0 }` can be used to set all members 0. And at least in more recent compilers you can used the designated initializer with dynamic allocation by casting:
There's one gotcha here, which is that the alignment requirements of the flexible array member can change the size of the struct. For example the following fails on x86_64-linux-gnu:
Because `struct flexible_int` needs to be 4-byte aligned but `struct flexible_char` only needs to be 1-byte aligned.