Why Doesn't string::data() Provide a Mutable char*?

问题

In c++11 array, string, and vector all got the data method which:

Returns pointer to the underlying array serving as element storage. The pointer is such that range [data(); data() + size()) is always a valid range, even if the container is empty. [Source]

This method is provided in a mutable and const version for all applicable containers, for example:

T* vector<T>::data();
const T* vector<T>::data() const;

All applicable containers, that is, except string which only provides the const version:

const char* string::data() const;

What happened here? Why did string get shortchanged, when char* string::data() would be so helpful?

回答1:

I think this restriction comes from the (pre-2011) days where std::basic_string didn't have to store its internal buffer as a contiguous byte array.

While all the others (std::vector and such) had to store their elements as a contiguous sequence per the 2003 standard; so data could easily return mutable T*, because there was no problem with iterations, etc.

If std::basic_string were to return a mutable char*, that would imply that you can treat that char* as a valid C-string and perform C-string operations like strcpy, that would easily turn to undefined behavior were the string not allocated contiguously.

The C++11 standard added the rule that basic_string has to be implemented as a contiguous byte array. Needless to say, you can work-around this by using the old trick of &str[0].

回答2:

The short answer is that c++17 does provide the char* string::data() method. Which is vital for the similarly c++17 data function, thus to gain mutable access to the underlying C-String I can now do this:

auto foo = "lorem ipsum"s;

for(auto i = data(foo); *i != '\0'; ++i) ++(*i);

For historical purposes it's worth chronicling string's development which c++17 is building upon: In c++11 access to string's underlying buffer is made possible possible by a new requirement that it's elements are stored contiguously such that for any given string s:

&*(s.begin() + n) == &*s.begin() + n for any n in [0, s.size()), or, equivalently, a pointer to s[0] can be passed to functions that expect a pointer to the first element of a CharT[] array.

Mutable access to this newly required underlying C-String was obtainable by various methods, for example: &s.front(), &s[0], or &*s.first() But back to the original question which would avoid the burden of using one of these options: Why hasn't access to string's underlying buffer been provided in the form of char* string::data()?

To answer that it is important to note that T* array<T>::data() and T* vector<T>::data() were an addition required by c++11. No additional requirements were incurred by c++11 against other contiguous containers such as deque. And there certainly wasn't an additional requirement for string, in fact the requirement that string was contiguous was new to c++11. Before this const char* string::data() had existed. Though it explicitly was not guaranteed to be pointing to any underlying buffer, it was the only way to obtain a const char* from a string:

The returned array is not required to be null-terminated.

This means that string was not "shortchanged" in c++11's transition to data accessors, it simply was not included thus only the const data accesor that string previously possessed persisted. There are naturally occurring examples in C++11's implementation which necessitate writing directly to the underlying buffer of a string.

来源：https://stackoverflow.com/questions/34155390/why-doesnt-stringdata-provide-a-mutable-char

标签

c++

string

c++11

containers

c-strings