Casting c_str() only works for short strings

倾然丶 夕夏残阳落幕 提交于 2019-12-23 08:48:00

问题


I'm using a C library in C++ and wrote a wrapper. At one point I need to convert an std::string to a c-style string. There is a class with a function, which returns a string. Casting the returned string works if the string is short, otherwise not. Here is a simple and reduced example illustrating the issue:

#include <iostream>
#include <string>

class StringBox {
public:
  std::string getString() const { return text_; }

  StringBox(std::string text) : text_(text){};

private:
  std::string text_;
};

int main(int argc, char **argv) {
  const unsigned char *castString = NULL;
  std::string someString = "I am a loooooooooooooooooong string";  // Won't work
  // std::string someString = "hello";  // This one works

  StringBox box(someString);

  castString = (const unsigned char *)box.getString().c_str();
  std::cout << "castString: " << castString << std::endl;

  return 0;
}

Executing the file above prints this to the console:

castString:

whereas if I swap the commenting on someString, it correctly prints

castString: hello

How is this possible?


回答1:


You are invoking c_str on a temporary string object retuned by the getString() member function. The pointer returned by c_str() is only valid as long as the original string object exists, so at the end of the line where you assign castString it ends up being a dangling pointer. Officially, this leads to undefined behavior.

So why does this work for short strings? I suspect that you're seeing the effects of the Short String Optimization, an optimization where for strings less than a certain length the character data is stored inside the bytes of the string object itself rather than in the heap. It's possible that the temporary string that was returned was stored on the stack, so when it was cleaned up no deallocations occurred and the pointer to the expired string object still holds your old string bytes. This seems consistent with what you're seeing, but it still doesn't mean what you're doing is a good idea. :-)




回答2:


box.getString() is an anonymous temporary. c_str() is only valid for the length of the variable.

So in your case, c_str() is invalidated by the time you get to the std::cout. The behaviour of reading the pointer contents is undefined.

(Interestingly the behaviour of your short string is possibly different due to std::string storing short strings in a different way.)




回答3:


box.getString() produces a temporary. Calling c_str() on that gives you a pointer to a temporary. After the temporary ceases to exist, which is immediately, the pointer is invalid, a dangling pointer.

Using a dangling pointer is Undefined Behavior.




回答4:


As you return by value

box.getString() is a temporary and so

box.getString().c_str() is valid only during the expression, then it is a dangling pointer.

You may fix that with

const std::string& getString() const { return text_; }



回答5:


First of all, your code has UB independent of the length of the string: At the end of

castString = (const unsigned char *)box.getString().c_str();

the string returned by getString is destroyed and castString is a dangling pointer to the internal buffer of the destroyed string object.

The reason your code "works" for small strings is probably Small String Optimization: Short strings are (commonly) saved in the string object itself instead of being saved in an dynamically allocated array, and apparently that memory is still accesible and unmodified in your case.



来源:https://stackoverflow.com/questions/35993690/casting-c-str-only-works-for-short-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!