Check whether equal string literals are stored at the same address

丶灬走出姿态 提交于 2019-12-08 16:14:22

问题


I am developing a (C++) library that uses unordered containers. These require a hasher (usually a specialization of the template structure std::hash) for the types of the elements they store. In my case, those elements are classes that encapsulate string literals, similar to conststr of the example at the bottom of this page. The STL offers an specialization for constant char pointers, which, however, only computes pointers, as explained here, in the 'Notes' section:

There is no specialization for C strings. std::hash<const char*> produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

Although this is very fast (or so I think), it is not guaranteed by the C++ standard whether several equal string literals are stored at the same address, as explained in this question. If they aren't, the first condition of hashers wouldn't be met:

For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2)

I would like to selectively compute the hash using the provided specialization, if the aforementioned guarantee is given, or some other algorithm otherwise. Although resorting back to asking those who include my headers or build my library to define a particular macro is feasible, an implementation defined one would be preferable.

Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

An example:

#ifdef __GXX_SAME_STRING_LITERALS_SAME_ADDRESS__
const char str1[] = "abc";
const char str2[] = "abc";
assert( str1 == str2 );
#endif

回答1:


Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

  • gcc has the -fmerge-constants option (this is not a guarantee) :

Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

  • Visual Studio has String Pooling (/GF option : "Eliminate Duplicate Strings")

String pooling allows what were intended as multiple pointers to multiple buffers to be multiple pointers to a single buffer. In the following code, s and t are initialized with the same string. String pooling causes them to point to the same memory:

char *s = "This is a character buffer";
char *t = "This is a character buffer";

Note: although MSDN uses char* strings literals, const char* should be used

  • clang apparently also has the -fmerge-constants option, but I can't find much about it, except in the --help section, so I'm not sure if it really is the equivalent of the gcc's one :

Disallow merging of constants


Anyway, how string literals are stored is implementation dependent (many do store them in the read-only portion of the program).

Rather than building your library on possible implementation-dependent hacks, I can only suggest the usage of std::string instead of C-style strings : they will behave exactly as you expect.

You can construct your std::string in-place in your containers with the emplace() methods :

    std::unordered_set<std::string> my_set;
    my_set.emplace("Hello");



回答2:


Although C++ does not seem to allow for any way that works with string literals, there is an ugly but somewhat workable way around the problem if you don't mind rewriting your string literals as character sequences.

template <typename T, T...values>
struct static_array {
  static constexpr T array[sizeof...(values)] { values... };
};

template <typename T, T...values>
constexpr T static_array<T, values...>::array[];

template <char...values>
using str = static_array<char, values..., '\0'>;

int main() {
  return str<'a','b','c'>::array != str<'a','b','c'>::array;
}

This is required to return zero. The compiler has to ensure that even if multiple translation units instantiate str<'a','b','c'>, those definitions get merged, and you only end up with a single array.

You would need to make sure you don't mix this with string literals, though. Any string literal is guaranteed not to compare equal to any of the template instantiations' arrays.



来源:https://stackoverflow.com/questions/25576363/check-whether-equal-string-literals-are-stored-at-the-same-address

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!