Why is splitting a string slower in C++ than Python?

后端 未结 8 1712
感情败类
感情败类 2020-12-07 08:14

I\'m trying to convert some code from Python to C++ in an effort to gain a little bit of speed and sharpen my rusty C++ skills. Yesterday I was shocked when a naive impleme

8条回答
  •  孤街浪徒
    2020-12-07 08:56

    I'm not providing any better solutions (at least performance-wise), but some additional data that could be interesting.

    Using strtok_r (reentrant variant of strtok):

    void splitc1(vector &tokens, const string &str,
            const string &delimiters = " ") {
        char *saveptr;
        char *cpy, *token;
    
        cpy = (char*)malloc(str.size() + 1);
        strcpy(cpy, str.c_str());
    
        for(token = strtok_r(cpy, delimiters.c_str(), &saveptr);
            token != NULL;
            token = strtok_r(NULL, delimiters.c_str(), &saveptr)) {
            tokens.push_back(string(token));
        }
    
        free(cpy);
    }
    

    Additionally using character strings for parameters, and fgets for input:

    void splitc2(vector &tokens, const char *str,
            const char *delimiters) {
        char *saveptr;
        char *cpy, *token;
    
        cpy = (char*)malloc(strlen(str) + 1);
        strcpy(cpy, str);
    
        for(token = strtok_r(cpy, delimiters, &saveptr);
            token != NULL;
            token = strtok_r(NULL, delimiters, &saveptr)) {
            tokens.push_back(string(token));
        }
    
        free(cpy);
    }
    

    And, in some cases, where destroying the input string is acceptable:

    void splitc3(vector &tokens, char *str,
            const char *delimiters) {
        char *saveptr;
        char *token;
    
        for(token = strtok_r(str, delimiters, &saveptr);
            token != NULL;
            token = strtok_r(NULL, delimiters, &saveptr)) {
            tokens.push_back(string(token));
        }
    }
    

    The timings for these are as follows (including my results for the other variants from the question and the accepted answer):

    split1.cpp:  C++   : Saw 20000000 lines in 31 seconds.  Crunch speed: 645161
    split2.cpp:  C++   : Saw 20000000 lines in 45 seconds.  Crunch speed: 444444
    split.py:    Python: Saw 20000000 lines in 33 seconds.  Crunch Speed: 606060
    split5.py:   Python: Saw 20000000 lines in 35 seconds.  Crunch Speed: 571428
    split6.cpp:  C++   : Saw 20000000 lines in 18 seconds.  Crunch speed: 1111111
    
    splitc1.cpp: C++   : Saw 20000000 lines in 27 seconds.  Crunch speed: 740740
    splitc2.cpp: C++   : Saw 20000000 lines in 22 seconds.  Crunch speed: 909090
    splitc3.cpp: C++   : Saw 20000000 lines in 20 seconds.  Crunch speed: 1000000
    

    As we can see, the solution from the accepted answer is still fastest.

    For anyone who would want to do further tests, I also put up a Github repo with all the programs from the question, the accepted answer, this answer, and additionally a Makefile and a script to generate test data: https://github.com/tobbez/string-splitting.

提交回复
热议问题