std::string and multiple concatenations

问题

Let’s consider that snippet, and please suppose that a, b, c and d are non-empty strings.

    std::string a, b, c, d;
    d = a + b + c;

When computing the sum of those 3 std::string instances, the standard library implementations create a first temporary std::string object, copy in its internal buffer the concatenated buffers of a and b, then perform the same operations between the temporary string and the c.

A fellow programmer was stressing that instead of this behaviour, operator+(std::string, std::string) could be defined to return a std::string_helper.

This object’s very role would be to defer the actual concatenations to the moment where it’s casted into a std::string. Obviously, operator+(std::string_helper, std::string) would be defined to return the same helper, which would "keep in mind" the fact that it has an additional concatenation to carry out.

Such a behavior would save the CPU cost of creating n-1 temporary objects, allocating their buffer, copying them, etc. So my question is: why doesn’t it already work like that ?I can’t think of any drawback or limitation.

回答1:

why doesn’t it already work like that?

I can only speculate about why it was originally designed like that. Perhaps the designers of the string library simply didn't think of it; perhaps they thought the extra type conversion (see below) might make the behaviour too surprising in some situations. It is one of the oldest C++ libraries, and a lot of wisdom that we take for granted simply didn't exist in past decades.

As to why it hasn't been changed to work like that: it could break existing code, by adding an extra user-defined type conversion. Implicit conversions can only involve at most one user-defined conversion. This is specified by C++11, 13.3.3.1.2/1:

A user-defined conversion sequence consists of an initial standard conversion sequence followed by a user-defined conversion followed by a second standard conversion sequence.

Consider the following:

struct thingy {
    thingy(std::string);
};

void f(thingy);

f(some_string + another_string);

This code is fine if the type of some_string + another_string is std::string. That can be implicitly converted to thingy via the conversion constructor. However, if we were to change the definition of operator+ to give another type, then it would need two conversions (string_helper to string to thingy), and so would fail to compile.

So, if the speed of string building is important, you'll need to use alternative methods like concatenation with +=. Or, according to Matthieu's answer, don't worry about it because C++11 fixes the inefficiency in a different way.

回答2:

The obvious answer: because the standard doesn't allow it. It impacts code by introducing an additional user defined conversion in some cases: if C is a type having a user defined constructor taking an std::string, then it would make:

C obj = stringA + stringB;

illegal.

回答3:

It depends.

In C++03, it is exact that there may be a slight inefficiency there (comparable to Java and C# as they use string interning by the way). This can be alleviated using:

d = std::string("") += a += b +=c;

which is not really... idiomatic.

In C++11, operator+ is overloaded for rvalue references. Meaning that:

d = a + b + c;

is transformed into:

d.assign(std::move(operator+(a, b).append(c)));

which is (nearly) as efficient as you can get.

The only inefficiency left in the C++11 version is that the memory is not reserved once and for all at the beginning, so there might be reallocation and copies up to 2 times (for each new string). Still, because appending is amortized O(1), unless C is quite longer than B, then at worst a single reallocation + copy should take place. And of course, we are talking POD copy here (so a memcpy call).

回答4:

Sounds to me like something like this already exists: std::stringstream.

Only you have << instead of +. Just because std::string::operator + exists, it doesn't make it the most efficient option.

回答5:

I think if you use +=, then it will be little faster:

d += a;
d += b;
d += c;

It should be faster, as it doesn't create temporary objects.Or simply this,

d.append(a).append(b).append(c); //same as above: i.e using '+=' 3 times.

回答6:

The main reason for not doing a string of individual + concatenations, and especially not doing that in a loop, is that is has O(n²) complexity.

A reasonable alternative with O(n) complexity is to use a simple string builder, like

template< class Char >
class ConversionToString
{
public:
    // Visual C++ 10.0 has some DLL linking problem with other types:
    CPP_STATIC_ASSERT((
        std::is_same< Char, char >::value || std::is_same< Char, wchar_t >::value
        ));

    typedef std::basic_string< Char >           String;
    typedef std::basic_ostringstream< Char >    OutStringStream;

    // Just a default implementation, not particularly efficient.
    template< class Type >
    static String from( Type const& v )
    {
        OutStringStream stream;
        stream << v;
        return stream.str();
    }

    static String const& from( String const& s )
    {
        return s;
    }
};


template< class Char, class RawChar = Char >
class StringBuilder;


template< class Char, class RawChar >
class StringBuilder
{
private:
    typedef std::basic_string< Char >       String;
    typedef std::basic_string< RawChar >    RawString;
    RawString   s_;

    template< class Type >
    static RawString fastStringFrom( Type const& v )
    {
        return ConversionToString< RawChar >::from( v );
    }

    static RawChar const* fastStringFrom( RawChar const* s )
    {
        assert( s != 0 );
        return s;
    }

    static RawChar const* fastStringFrom( Char const* s )
    {
        assert( s != 0 );
        CPP_STATIC_ASSERT( sizeof( RawChar ) == sizeof( Char ) );
        return reinterpret_cast< RawChar const* >( s );
    }

public:
    enum ToString { toString };
    enum ToPointer { toPointer };

    String const&   str() const             { return reinterpret_cast< String const& >( s_ ); }
    operator String const& () const         { return str(); }
    String const& operator<<( ToString )    { return str(); }

    RawChar const*     ptr() const          { return s_.c_str(); }
    operator RawChar const* () const        { return ptr(); }
    RawChar const* operator<<( ToPointer )  { return ptr(); }

    template< class Type >
    StringBuilder& operator<<( Type const& v )
    {
        s_ += fastStringFrom( v );
        return *this;
    }
};

template< class Char >
class StringBuilder< Char, Char >
{
private:
    typedef std::basic_string< Char >   String;
    String  s_;

    template< class Type >
    static String fastStringFrom( Type const& v )
    {
        return ConversionToString< Char >::from( v );
    }

    static Char const* fastStringFrom( Char const* s )
    {
        assert( s != 0 );
        return s;
    }

public:
    enum ToString { toString };
    enum ToPointer { toPointer };

    String const&   str() const             { return s_; }
    operator String const& () const         { return str(); }
    String const& operator<<( ToString )    { return str(); }

    Char const*     ptr() const             { return s_.c_str(); }
    operator Char const* () const           { return ptr(); }
    Char const* operator<<( ToPointer )     { return ptr(); }

    template< class Type >
    StringBuilder& operator<<( Type const& v )
    {
        s_ += fastStringFrom( v );
        return *this;
    }
};

namespace narrow {
    typedef StringBuilder<char>     S;
}  // namespace narrow

namespace wide {
    typedef StringBuilder<wchar_t>  S;
}  // namespace wide

Then you can write efficient and clear things like …

using narrow::S;

std::string a = S() << "The answer is " << 6*7;
foo( S() << "Hi, " << username << "!" );

来源：https://stackoverflow.com/questions/9619659/stdstring-and-multiple-concatenations

标签

c++

string

optimization

std