I have discovered a disturbing inconsistency between std::string
and string literals in C++0x:
#include <iostream>
#include <string>
int main()
{
int i = 0;
for (auto e : "hello")
++i;
std::cout << "Number of elements: " << i << '\n';
i = 0;
for (auto e : std::string("hello"))
++i;
std::cout << "Number of elements: " << i << '\n';
return 0;
}
The output is:
Number of elements: 6
Number of elements: 5
I understand the mechanics of why this is happening: the string literal is really an array of characters that includes the null character, and when the range-based for loop calls std::end()
on the character array, it gets a pointer past the end of the array; since the null character is part of the array, it thus gets a pointer past the null character.
However, I think this is very undesirable: surely std::string
and string literals should behave the same when it comes to properties as basic as their length?
Is there a way to resolve this inconsistency? For example, can std::begin()
and std::end()
be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?
EDIT: To justify my indignation a bit more to those who have said that I'm just suffering the consequences of using C-style strings which are a "legacy feature", consider code like the following:
template <typename Range>
void f(Range&& r)
{
for (auto e : r)
{
...
}
}
Would you expect f("hello")
and f(std::string("hello"))
to do something different?
If we overloaded std::begin()
and std::end()
for const char arrays to return one less than the size of the array, then the following code would output 4 instead of the expected 5:
#include <iostream>
int main()
{
const char s[5] = {'h', 'e', 'l', 'l', 'o'};
int i = 0;
for (auto e : s)
++i;
std::cout << "Number of elements: " << i << '\n';
}
However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?
String literals by definition have a (hidden) null character at the end of the string. Std::strings do not. Because std::strings have a length, that null character is a bit superfluous. The standard section on the string library explicitly allows non-null terminated strings.
Edit
I don't think I've ever given a more controversial answer in the sense of a huge amount of upvotes and a huge amount of downvotes.
The auto
iterator when applied to a C-style array iterates over each element of the array. The determination of the range is made at compile-time, not run time. This is ill-formed, for instance:
char * str;
for (auto c : str) {
do_something_with (c);
}
Some people use arrays of type char to hold arbitrary data. Yes, it is an old-style C way of thinking, and perhaps they should have used a C++-style std::array, but the construct is quite valid and quite useful. Those people would be rather upset if their auto iterator over a char buffer[1024];
stopped at element 15 just because that element happens to have the same value as the null character. An auto iterator over a Type buffer[1024];
will run all the way to the end. What makes a char array so worthy of a completely different implementation?
Note that if you want the auto iterator over a character array to stop early there is an easy mechanism to do that: Add a if (c == '0') break;
statement to the body of your loop.
Bottom line: There is no inconsistency here. The auto
iterator over a char[] array is consistent with how auto iterator work any other C-style array.
That you get 6
in the first case is an abstraction leak that couldn't be avoided in C. std::string
"fixes" that. For compatibility, the behaviour of C-style string literals does not change in C++.
For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?
Assuming access through a pointer (as opposed to char[N]
), only by embedding a variable inside the string containing the number of characters, so that seeking for NULL
isn't required any more. Oops! That's std::string
.
The way to "resolve the inconsistency" is not to use legacy features at all.
According to N3290 6.5.4, if the range is an array, boundary values are
initialized automatically without begin
/end
function dispatch.
So, how about preparing some wrapper like the following?
struct literal_t {
char const *b, *e;
literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
char const* begin() const { return b; }
char const* end () const { return e; }
};
template< int N >
literal_t literal( char const (&a)[N] ) {
return literal_t( a, a + N - 1 );
};
Then the following code will be valid:
for (auto e : literal("hello")) ...
If your compiler provides user-defined literal, it might help to abbreviate:
literal operator"" _l( char const* p, std::size_t l ) {
return literal_t( p, p + l ); // l excludes '\0'
}
for (auto e : "hello"_l) ...
EDIT: The following will have smaller overhead (user-defined literal won't be available though).
template< size_t N >
char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
return (char const(&)[ N - 1 ]) x;
}
for (auto e : literal("hello")) ...
If you wanted the length, you should use strlen()
for the C string and .length()
for the C++ string. You can't treat C strings and C++ strings identically--they have different behavior.
The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:
std::string operator""s(const char* p, size_t n)
{
return string(p, n);
}
We'll be able to write:
int i = 0;
for (auto e : "hello"s)
++i;
std::cout << "Number of elements: " << i << '\n';
Which now outputs the expected number:
Number of elements: 5
With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.
来源:https://stackoverflow.com/questions/6727412/inconsistency-between-stdstring-and-string-literals