C++ String Interview Question

℡╲_俬逩灬. 提交于 2019-12-03 11:17:53

This implementation should be fast:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

It passes all three unit tests.

When using GNU libstdc++, the line that declares & initializes ret is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin and end on s are also fast because they will resolve to the const versions of begin and end, begin() const and end() const, so the internal representation of s is not "leaked". With libstdc++, std::string::const_iterator is const char*, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*) calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*) results in something like a memmove. Finally, the reserve operation ensures that enough memory is available for the return value.

EDIT: For the curious, here is the initialization of ret in the assembly output of MinGW g++ 4.5.0:

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

It's simply copying the pointer to the global "empty representation".

EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:

Variant 1 (the "c_str variant"):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

Variant 2 (the "data string" variant):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

Variant 3:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

Variant 4 (my original code):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

The results for g++ 4.5.0 are:

  • Variant 4 is the fastest
  • Variant 3 is second (5% slower than variant 4)
  • Variant 1 is third (2% slower than variant 3)
  • Variant 2 is fourth (0.2% slower than variant 1)

The results for VC++ 16.00.30319.01 are:

  • Variant 1 is the fastest
  • Variant 2 is second (3% slower than variant 1)
  • Variant 4 is third (4% slower than variant 2)
  • Variant 3 is fourth (17% slower than variant 4)

Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.

One thing interesting from the VC++ results is that using c_str rather than data is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.

EDIT3:

Actually, I just thought about another variant:

Variant 5:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

It's just like variant 4 except that the begin and end iterators for s are saved.

When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:

  • Variant 1 is the fastest
  • Variant 5 is second (1.6% slower than variant 1)
  • Variant 2 is third (1.4% slower than variant 5)
  • Variant 4 is third (4% slower than variant 2)
  • Variant 3 is fourth (17% slower than variant 4)

If you don't need to maintain the contents of the original string, then you can copy the last n characters into positions [n+1, 2n] of the original string and truncate it at 2n. You will have to be careful to first expand the string and also be careful not to overwrite any characters before writing to them if the string is shorter than 2n.

This will halve the number of operations to construct the string, as well as remove the need to create a new string. So its theoretically between 2 and 4 times faster. But of course you have just destroyed the original string, which you'd have to ask the interviewer if it is acceptable.

How about removing the middle N-2n characters, where N is the length of the source string?

// compiled with cl /Ox first_last_n.cpp /W4 /EHsc

inline void
first_last_n2(string::size_type n, const std::string &s, string &out)  // method 2
{
  // check against degenerate input
  assert(n > 0);
  assert(n <= s.size());

  out.reserve(2*n);
  out.assign(s, 0, n);
  out.append(s, s.size()-n, n);
}

Times:

method 1:  // original method
2.281
method 2:  // my method
0.687
method 3:  // your code.
0.782

Note: Timing specifically tests "long" strings. I.e. those where short string optimization is not used. (My strings were 100 length).

My only thought is that if this function is only called with C null-terminated strings, you might be requiring the extra construction of a std::string for parameter 's'.

Possibly the 'more' efficient method would be to allow either a std::string or const char *s passed in.

Memcpy is a cheat?

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

EDIT So I removed the other one. This is not a cheat.

If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.

So change the tests and change the signature.

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

Then call it like so:

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.

I like my algorithms minimalistic, and I purposely eliminated the check for n being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!