In a C++ question about optimization and code style, several answers referred to \"SSO\" in the context of optimizing copies of std::string. What does SSO mean
As already explained by the other answers, SSO means Small / Short String Optimization. The motivation behind this optimization is the undeniable evidence that applications in general handle much more shorter strings than longer strings.
As explained by David Stone in his answer above, the std::string class uses an internal buffer to store contents up to a given length, and this eliminates the need to dynamically allocate memory. This makes the code more efficient and faster.
This other related answer clearly shows that the size of the internal buffer depends on the std::string implementation, which varies from platform to platform (see benchmark results below).
Here is a small program that benchmarks the copy operation of lots of strings with the same length. It starts printing the time to copy 10 million strings with length = 1. Then it repeats with strings of length = 2. It keeps going until the length is 50.
#include
#include
#include
#include
static const char CHARS[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static const int ARRAY_SIZE = sizeof(CHARS) - 1;
static const int BENCHMARK_SIZE = 10000000;
static const int MAX_STRING_LENGTH = 50;
using time_point = std::chrono::high_resolution_clock::time_point;
void benchmark(std::vector& list) {
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
// force a copy of each string in the loop iteration
for (const auto s : list) {
std::cout << s;
}
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast(t2 - t1).count();
std::cerr << list[0].length() << ',' << duration << '\n';
}
void addRandomString(std::vector& list, const int length) {
std::string s(length, 0);
for (int i = 0; i < length; ++i) {
s[i] = CHARS[rand() % ARRAY_SIZE];
}
list.push_back(s);
}
int main() {
std::cerr << "length,time\n";
for (int length = 1; length <= MAX_STRING_LENGTH; length++) {
std::vector list;
for (int i = 0; i < BENCHMARK_SIZE; i++) {
addRandomString(list, length);
}
benchmark(list);
}
return 0;
}
If you want to run this program, you should do it like ./a.out > /dev/null so that the time to print the strings isn't counted.
The numbers that matter are printed to stderr, so they will show up in the console.
I have created charts with the output from my MacBook and Ubuntu machines. Note that there is a huge jump in the time to copy the strings when the length reaches a given point. That's the moment when strings don't fit in the internal buffer anymore and memory allocation has to be used.
Note also that on the linux machine, the jump happens when the length of the string reaches 16. On the macbook, the jump happens when the length reaches 23. This confirms that SSO depends on the platform implementation.
Ubuntu
Macbook Pro