C++ Fastest way to find first space in right-padded null-terminated char array of fixed size 9

问题

The average length is 4 characters for the strings. I was thinking a binary search might be the fastest starting at position 4. Also I think an inlined templatized function might perform well. This is done in a very tight loop so performance is critical.

The data looks like:

"1234    "
"ABC     "
"A1235   "
"A1235kgo"

回答1:

char* found = std::find(arr, arr+9, ' ');

Note that 'no match' is signaled wuth the end iterator:

bool match = (arr+9) != found;

Note, that

binary search doesn't apply unless you characters are in some known ordering.
std::find is inlined, templatized and will perform to the max if you turn on optimization (e.g. -O3 -march=native for g++)

Edit since you have shown more code, I now realize you actually want to detect (sub)string length. You could use

std::string::find_first_of
std::string::find_last_of
std::string::find
std::string::rfind etc.

Of course, that assumes you'd want to convert the char[] to std::string for the purpose. In practice, that might be a perfectly valid idea, because of SSO (Small String Optimization) found in nearly all implementations of the C++ standard library. (see Items 13-16 in Herb Sutter's More Exceptional C++, or Scott Meyers' discussion of commercial std::string implementations in Effective STL).

回答2:

You can indeed use binary search to find the first space character (in this case using std::lower_bound(...)):

const char *data= ...;// 8 character string to search

const char *end= std::lower_bound(data, data + 8, ' ', [](char lhs, char rhs)
{
    bool lhs_is_space= lhs==' ';
    bool rhs_is_space= rhs==' ';

    return lhs_is_space < rhs_is_space;
});

Which is effectively using binary search to find the first space character. The basic idea is to pretend non-space characters are false and space characters are true, and to further assume that all non-space characters come before space characters. If this is true, then the sequence is sorted according to this classification and we can simply find the start (lower bound, that is) of the run of space characters.

回答3:

Since the spaces are all at the end, you can use an unrolled binary search. However, a regular linear search is miraculously close in speed, and won't make future developers hate you.

inline int find_space(char (&data)[9]) {
    if (data[3] == ' ') {
        if (data[1] == ' ') {
            if (data[0] == ' ')
                return 0;
            return 1;
        } else if (data[2] == ' ')
            return 2;
        return 3; 
    }
    if (data[5] == ' ') {
        if (data[4] == ' ')
            return 4;
        return 5;
    } else if (data[7] == ' ') {
        if (data[6] == ' ')
            return 6;
        return 7;
    } else if (data[8] == ' ')
        return 8;
    return -1;
}

回答4:

Allow the compiler and optimizer to do its job.

inline
template <typename T_CHAR, int N>
T_CHAR* find_first_of(T_CHAR a[N], T_CHAR t)
{
    for (int ii = 0; ii < N; ++ii)
    {
        if (a[ii] == t) { return a+ii; }
    }
    return NULL;
}

Or allow the Standard Template Library authors to do all the heavy lifting for you.

inline
template <typename T_CHAR, int N>
T_CHAR* find_first_of(T_CHAR a[N], T_CHAR t)
{
    T_CHAR* ii = std::find(a, a+N, t);
    if (ii == a+N) return NULL;
    return ii;
}

回答5:

Just my 2 cents. I suppose all strings have length 8. Possible characters 'A'-'Z', 'a'-'z', '0'-'9' and space. And I tried:

//simple
const char *found = std::find(x.data, x.data + 9, ' ');
//binary search
const char *end = std::lower_bound(x.data, x.data + 8, ' ', [](char lhs, char rhs) {

and my optimized version (it depends on compiler==gcc) (see bellow). I tested on Linux 64bit, with -O3 -march=native -std=c++0x. Results for randomly generated 50000000 strings:

simple take 0.480000,
optimized take 0.120000,
binary search take 0.600000.

union FixedLenStr {
     unsigned char chars[8];
     uint32_t words[2];
     uint64_t  big_word;
};

static int space_finder(const char *str) 
{   
        FixedLenStr tmp;

        memcpy(tmp.chars, str, 8);

        tmp.big_word &= 0xF0F0F0F0F0F0F0F0ull;
        tmp.big_word >>= 4;
        tmp.big_word = (0x0707070707070707ull - tmp.big_word) * 26;
        tmp.big_word &= 0x8080808080808080ull;      

        return (__builtin_ffsll(tmp.big_word) >> 3) - 1;    
}

来源：https://stackoverflow.com/questions/8043390/c-fastest-way-to-find-first-space-in-right-padded-null-terminated-char-array-o

标签

c++

string

templates

find