Requiring the compiler to infer inner dimensions from the initializers would require the compiler to work retroactively in a way the standard avoids.
The standard allows objects being initialized to refer to themselves. For example:
struct foo { struct foo *next; int value; } head = { &head, 0 };
This defines a node of a linked list that points to itself initially. (Presumably, more nodes would be inserted later.) This is valid because C 2011 [N1570] 6.2.1 7 says the identifier head
“has scope that begins just after the completion of its declarator.” A declarator is the part of the grammar of a declaration that includes the identifier name along with the array, function, and/or pointer parts of the declaration (for example, f(int, float)
and *a[3]
are declarators, in a declarations such as float f(int, float)
or int *a[3]
).
Because of 6.2.1 7, a programmer could write this definition:
void *p[][1] = { { p[1] }, { p[0] } };
Consider the initializer p[1]
. This is an array, so it is automatically converted to a pointer to its first element, p[1][0]
. The compiler knows that address because it knows p[i]
is an array of 1 void *
(for any value of i
). If the compiler did not know how big p[i]
was, it could not calculate this address. So, if the C standard allowed us to write:
void *p[][] = { { p[1] }, { p[0] } };
then the compiler would have to continue scanning past p[1]
so it can count the number of initializers given for the second dimension (just one in this case, but we have to scan at least to the }
to see that, and it could be many more), then go back and calculate the value of p[1]
.
The standard avoids forcing compilers to do this sort of multiple-pass work. Requiring compilers to infer the inner dimensions would violate this goal, so the standard does not do it.
(In fact, I think the standard might not require the compiler to do any more than a finite amount of look-ahead, possibly just a few characters during tokenization and a single token while parsing the grammar, but I am not sure. Some things have values not known until link time, such as void (*p)(void) = &SomeFunction;
, but those are filled in by the linker.)
Additionally, consider a definition such as:
char x[][] =
{
{ 0, 1 },
{ 10, 11 },
{ 20, 21, 22 }
};
As the compiler reads the first two lines of initial values, it may want to prepare a copy of the array in memory. So, when it reads the first line, it will store two values. Then it sees the line end, so it can assume for the moment the inner dimension is 2, forming char x[][2]
. When it sees the second line, it allocates more memory (as with realloc
) and continues, storing the next two values, 10 and 11, in their appropriate places.
When it reads the third line and sees 22
, it realizes the inner dimension is at least three. Now the compiler cannot simply allocate more memory. It has to rearrange where 10 and 11 are in memory relative to 0 and 1, because there is a new element between them; x[0][2]
now exists and has a value of 0 (so far). So requiring the compile to infer the inner dimensions while also allowing different numbers of initializers in each subarray (and inferring the inner dimension based on the maximum number of initializers seen throughout the entire list) can burden the compiler with a lot of memory motion.