Implementing a “string pool” that is guaranteed not to move

落爺英雄遲暮 提交于 2019-11-29 06:41:17

I've used this approach in the past:

using Atom = const char*;

Atom make_atom(string const& value)
{
    static set<string> interned;
    return interned.insert(value).first->c_str();
}

Obviously, if you want/need to clear the set, you'd make it available in some wider scope.

For even more efficiency move/emplace the strings into the set.

Update I've added this approach for completeness. See it Live on Coliru

#include <string>
#include <set>
using namespace std;

using Atom = const char*;

template <typename... Args>
typename enable_if<
    is_constructible<string, Args...>::value, Atom
>::type emplace_atom(Args&&... args)
{
    static set<string> interned;
    return interned.emplace(forward<Args>(args)...).first->c_str();
}

#include <iostream>

int main() {
    cout << emplace_atom("Hello World\n");
    cout << emplace_atom(80, '=');
}

Yes, you're going to have to write a list of buffers. No, don't do all the hard work yourself.

The underlying datastructure should be a std::vector<std::string>. Using a (forward) list doesn't buy you a whole lot. When the vector is resized, the strings are moved efficiently. std::forward_list<std::string>. Even if the list is resized, the strings themselves remain in place. Iterating the list is only needed for a .clear so the list performance is not critical.

The wrapper class should abstract away the addition of new strings. A new string should be added when the capacity of the last string isn't enough to add the new string. When you add a new string, reserve all the memory a chunk will need - this ensures the capacity will be large enough to prevent reallocations later on.

This setup may waste some space when a large new allocation forces the use of a new chunk, leaving part of an older chunk unused. You could of course remember the size remaining in the last N blocks, for a small value of N such that those chunks might still be in cache. But it's quite possible that in your app N=5 would already be too big.

Shoe

Recapping, your requirements are:

  • Be able to push elements
  • Be able to obtain an iterator to the beginning of the sequence
  • Iterators should not get invalidated if the sequence grows
  • Be able to clear the sequence
  • Don't reserve maximum capacity

It seems that std::list<char> fits perfectly into this list of requirements. Of course you might need a wrapper around the class to make it behave exactly like std::string, but that really depends on how you manipulate the data.

And here's how well it fits the requirements:

  • To push elements, you can use the push_back and emplace_back member functions.

  • std::begin(container) or the member function begin will retrieve the iterator to the first element of the sequence.

  • Addition, removal and moving the elements within the list or across several lists does not invalidate the iterators. An iterator is invalidated only when the corresponding element is deleted.

  • To clear the sequence you can use the member function clear.

  • Most of the time it is implemented as a doubly-linked list, therefore no capacity is reserved.

Since std::list seems memory inefficient (even though the standard doesn't specify the size of it nor its implementation), it's correct to add that you can also use std::deque<char> with almost the same interface as above. The only difference being that std::deque might reserve unused memory.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!