You seem to assume that a necessary trait of a "true string literal"
is that the compiler bakes it into the static storage of the executable.
This is not actually true. The C and C++ standards guarantee us that
a string literal shall have static storage duration, so it must exist for the
life of the program, but if a compiler can arrange this without placing
the literal in static storage, it is free to do so, and some compilers sometimes
do.
However, it's clear that the property you want to test, for a given string
literal, is whether it is in fact in static storage. And since it need not
be in static storage, as far as the language standards guarantee, there
can't be any solution of your problem founded solely on portable C/C++.
Whether a given string literal is in fact in static storage is the question
of whether the address of the string literal lies within one of the
address ranges that get assigned to linkage sections that qualify as
static storage, in the nomenclature of your particular toolchain, when
your program is built by that toolchain.
So the solution I suggest is that you enable your program to know the
address ranges of those of its own linkage sections that qualify as
static storage, and then it can test whether a given string literal
is in static storage by obvious code.
Here is an illustration of this solution for a toy C++ project, prog
built with the GNU/Linux x86_64 toolchain (C++98 or better will do, and the
approach is only slightly more fiddly for C). In this setting, we link in ELF
format, and the linkage sections we will deem static storage are .bss
(0-initialized static data), .rodata
(read-only static static) and .data
(read/write static data).
Here are our source files:
section_bounds.h
#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif
section_bounds.cpp
// Assign either placeholder or pre-defined values to
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const
section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const
section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const
section_rodata_start = RODATA_START;
extern unsigned long const
section_rodata_size = RODATA_SIZE;
extern unsigned long const
section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const
section_data_start = DATA_START;
extern unsigned long const
section_data_size = DATA_SIZE;
extern unsigned long const
section_data_end = section_data_start + section_data_size;
cstr_storage_triage.h
#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H
// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);
#endif
cstr_storage_triage.cpp
#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>
using namespace std;
void cstr_storage_triage(const char *s)
{
unsigned long addr = (unsigned long)s;
cout << "When s = " << (void*)s << " -> "" << s << '"' << endl;
if (addr >= section_bss_start && addr < section_bss_end) {
cout << "then s is in static 0-initialized data
";
} else if (addr >= section_rodata_start && addr < section_rodata_end) {
cout << "then s is in static read-only data
";
} else if (addr >= section_data_start && addr < section_data_end){
cout << "then s is in static read/write data
";
} else {
cout << "then s is on the stack/heap
";
}
}
main.cpp
// Demonstrate storage classification of various arrays of char
#include "cstr_storage_triage.h"
static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";
int main()
{
char on_stack[] = "On stack";
cstr_storage_triage(in_bss);
cstr_storage_triage(in_rodata);
cstr_storage_triage(in_rwdata);
cstr_storage_triage(on_stack);
cstr_storage_triage("Where am I?");
return 0;
}
Here is our makefile:
.PHONY: all clean
SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map
ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS +=
-DBSS_START=$(BSS_START)
-DBSS_SIZE=$(BSS_SIZE)
-DRODATA_START=$(RODATA_START)
-DRODATA_SIZE=$(RODATA_SIZE)
-DDATA_START=$(DATA_START)
-DDATA_SIZE=$(DATA_SIZE)
endif
all: $(TARG)
clean:
rm -f $(OBJS) $(MAP_FILE) $(TARG)
ifndef AGAIN
$(MAP_FILE): $(OBJS)
g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
touch section_bounds.cpp
$(TARG): $(MAP_FILE)
$(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif
Here is what make
looks like:
$ make
g++ -c -o main.o main.cpp
g++ -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++ -c -o section_bounds.o section_bounds.cpp
g++ -o prog -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++ -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
-DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
-c -o section_bounds.o section_bounds.cpp
g++ -o prog main.o cstr_storage_triage.o section_bounds.o
And lastly, what prog
does:
$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data
If it's obvious how this works, you need read no further.
The program will compile and link even before we know the addresses and
sizes of its static storage sections. It would need too, wouldn't it!? In
that case, the global section_*
variables that ought to hold these values
all get built with place-holder values.
When make
is run, the recipes:
$(TARG): $(MAP_FILE)
$(MAKE) AGAIN=1
and
$(MAP_FILE): $(OBJS)
g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
touch section_bounds.cpp
are operative, because AGAIN
is undefined. They tell make
that in order
to build prog
it must first build the linker map file of prog
, as per
the second recipe, and then re-timestamp section_bounds.cpp
. After that,
make
is to call itself again, with AGAIN
defined = 1.
Excecuting the makefile again, with AGAIN
defined, make
now finds that it
must compute all the variables:
BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE
For each static storage section S
, it computes S_BOUNDS
by grepping
the linker map file for the line that reports the address and size of S
.
From that line, it assigns the 2nd word ( = the section address) to S_START
,
and the 3rd word ( = the size of the section) to S_SIZE
. All the section
delimiting values are then appended, via -D
options to the CPPFLAGS
that will automatically be passed to compilations.
Because AGAIN
is defined, the operative recipe for $(TARG)
is now the customary:
$(TARG): $(OBJS)
g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
But we touched section_bounds.cpp
in the parent make
; so it has to be
recompiled, and therefore prog
has to be relinked. This time, when
section_bounds.cpp
is compiled, all the section-delimiting macros:
BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE
will have pre-defined values and will not assume their place-holder values.
And those predefined values will be correct because the second linkage
adds no symbols to the linkage and removes none, and does not alter the
size or storage class of any symbol. It just assigns different values to
symbols that were present in the first linkage. Consequently, the
addresses and sizes of the static storage sections will be unaltered and are now known to your program.