Consider the following test program:
#include
#include
#include
int main()
{
std::cout << sizeof(st
Summary: It only looks like libstdc++ uses one char*. In fact, it allocates more memory.
So, you should not be concerned that Clang's libc++ implementation is memory inefficient.
From the documentation of libstdc++ (under Detailed Description):
A string looks like this:
[_Rep]
_M_length
[basic_string] _M_capacity
_M_dataplus _M_refcount
_M_p ----------------> unnamed array of char_type
Where the _M_p points to the first character in the string, and you cast it to a pointer-to-_Rep and subtract 1 to get a pointer to the header.
This approach has the enormous advantage that a string object requires only one allocation. All the ugliness is confined within a single pair of inline functions, which each compile to a single add instruction: _Rep::_M_data(), and string::_M_rep(); and the allocation function which gets a block of raw bytes and with room enough and constructs a _Rep object at the front.
The reason you want _M_data pointing to the character array and not the _Rep is so that the debugger can see the string contents. (Probably we should add a non-inline member to get the _Rep for the debugger to use, so users can check the actual string length.)
So, it just looks like one char* but that is misleading in terms of memory usage.
Previously libstdc++ basically used this layout:
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
That is closer to the results from libc++.
libc++ uses "short string optimization". The exact layout depends on whether _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT is defined. If it is defined, the data pointer will be word-aligned if the string is short. For details, see the source code.
Short string optimization avoids heap allocations, so it also looks more costly than libstdc++ implementation if you only consider the parts that are allocated on the stack. sizeof(std::string) only shows the stack usage not the overall memory usage (stack + heap).