I need a mix of strtok and strtok_single

前端 未结 2 616
你的背包
你的背包 2020-12-06 15:49

I have the following string that I am trying to parse for variables.

char data[]=\"to=myself@gmail.com&cc=youself@gmail.com&title=&content=how ar         


        
相关标签:
2条回答
  • 2020-12-06 16:21

    You haven't exactly told us what you mean by "this works fine", though it seems sufficient to say that you want to parse an application/x-www-form-urlencoded string. Why didn't you say so in the first place?

    Consider that the first field, key, may be terminated by the first of either '=' or '&'. It would be appropriate to search for a token that ends in either of those characters, to extract key.

    The second field, value, however, isn't terminated by an '=' character, so it's inappropriate to be searching for that character to extract value. You'd want to search for '&' only.

    Sure. You could use strtok to parse this, however I'm sure there are many more suitable tools. strcspn, for example, won't make any changes to data, which means you won't need to make a copy of data as you are...

    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
        char data[]="to=myself@gmail.com&cc=youself@gmail.com&title=&content=how are you?&signature=best regards.";
    
        char *key = data;
        do {
            int key_length = strcspn(key, "&=");
    
            char *value = key + key_length + (key[key_length] == '=');
            int value_length = strcspn(value, "&");
    
            printf("Key:   %.*s\n"
                   "Value: %.*s\n\n",
                   key_length,   key,
                   value_length, value);
    
            key = value + value_length + (value[value_length] == '&');
        } while (*key);
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-06 16:35

    There are two bugs lurking here. One is in strtok_single(). If you run it repeatedly, it does not return the last segment, after the = after signature, unlike strtok().

    When that's fixed, there is still a problem with the code in parsePostData(); it returns a pointer to an automatic variable. The copy of the string must be handled differently; the simplest way (which is consistent with using strtok() rather than strtok_r() or strtok_s()) is to make the tCpy variable static.

    Test program emt.c

    This is a composite program that shows the problems and also a set of fixes. It applies different 'splitter' functions — functions with the same signature as strtok() — to the data. It demonstrates the bug in strtok_single() and that strtok_fixed() fixes that bug. It demonstrates that the code in parsePostData() works correctly when it is fixed and strtok_fixed() is used.

    #include <stdio.h>
    #include <string.h>
    
    /* Function pointer for strtok, strtok_single, strtok_fixed */
    typedef char *(*Splitter)(char *str, const char *delims);
    
    /* strtok_single - as quoted in SO 30294129 (from SO 8705844) */
    static char *strtok_single(char *str, char const *delims)
    {
        static char  *src = NULL;
        char  *p,  *ret = 0;
    
        if (str != NULL)
            src = str;
    
        if (src == NULL)
            return NULL;
    
        if ((p = strpbrk(src, delims)) != NULL)
        {
            *p  = 0;
            ret = src;
            src = ++p;
        }
    
        return ret;
    }
    
    /* strtok_fixed - fixed variation of strtok_single */
    static char *strtok_fixed(char *str, char const *delims)
    {
        static char  *src = NULL;
        char  *p,  *ret = 0;
    
        if (str != NULL)
            src = str;
    
        if (src == NULL || *src == '\0')    // Fix 1
            return NULL;
    
        ret = src;                          // Fix 2
        if ((p = strpbrk(src, delims)) != NULL)
        {
            *p  = 0;
            //ret = src;                    // Unnecessary
            src = ++p;
        }
        else
            src += strlen(src);
    
        return ret;
    }
    
    /* Raw test of splitter functions */
    static void parsePostData1(const char *s, const char *t, Splitter splitter)
    {
        static char tCpy[512];
        strcpy(tCpy, t);
        char *pch = splitter(tCpy, "=&");
        while (pch != NULL)
        {
            printf("  [%s]\n", pch);
            if (strcmp(pch, s) == 0)
                printf("matches %s\n", s);
            pch = splitter(NULL, "=&");
        }
    }
    
    /* Fixed version of parsePostData() from SO 30294129 */
    static char *parsePostData2(const char *s, const char *t, Splitter splitter)
    {
        static char tCpy[512];
        strcpy(tCpy, t);
        char *pch = splitter(tCpy, "=&");
        while (pch != NULL)
        {
            if (strcmp(pch, s) == 0)
            {
                pch = splitter(NULL, "&");
                return pch;
            }
            else
            {
                pch = splitter(NULL, "=&");
            }
        }
        return NULL;
    }
    
    /* Composite test program */
    int main(void)
    {
        char data[] = "to=myself@gmail.com&cc=youself@gmail.com&title=&content=how are you?&signature=best regards.";
        char *tags[] = { "to", "cc", "title", "content", "signature" };
        enum { NUM_TAGS = sizeof(tags) / sizeof(tags[0]) };
    
        printf("\nCompare variants on strtok()\n");
        {
            int i = NUM_TAGS - 1;
            printf("strtok():\n");
            parsePostData1(tags[i], data, strtok);
            printf("strtok_single():\n");
            parsePostData1(tags[i], data, strtok_single);
            printf("strtok_fixed():\n");
            parsePostData1(tags[i], data, strtok_fixed);
        }
    
        printf("\nCompare variants on strtok()\n");
        for (int i = 0; i < NUM_TAGS; i++)
        {
            char *value1 = parsePostData2(tags[i], data, strtok);
            printf("strtok: [%s] = [%s]\n", tags[i], value1);
            char *value2 = parsePostData2(tags[i], data, strtok_single);
            printf("single: [%s] = [%s]\n", tags[i], value2);
            char *value3 = parsePostData2(tags[i], data, strtok_fixed);
            printf("fixed:  [%s] = [%s]\n", tags[i], value3);
        }
    
        return 0;
    }
    

    Example output from emt

    Compare variants on strtok()
    strtok():
      [to]
      [myself@gmail.com]
      [cc]
      [youself@gmail.com]
      [title]
      [content]
      [how are you?]
      [signature]
    matches signature
      [best regards.]
    strtok_single():
      [to]
      [myself@gmail.com]
      [cc]
      [youself@gmail.com]
      [title]
      []
      [content]
      [how are you?]
      [signature]
    matches signature
    strtok_fixed():
      [to]
      [myself@gmail.com]
      [cc]
      [youself@gmail.com]
      [title]
      []
      [content]
      [how are you?]
      [signature]
    matches signature
      [best regards.]
    

    And:

    Compare variants on strtok()
    ✓ strtok: [to] = [myself@gmail.com]
    ✓ single: [to] = [myself@gmail.com]
    ✓ fixed:  [to] = [myself@gmail.com]
    ✓ strtok: [cc] = [youself@gmail.com]
    ✓ single: [cc] = [youself@gmail.com]
    ✓ fixed:  [cc] = [youself@gmail.com]
    ✕ strtok: [title] = [content=how are you?]
    ✓ single: [title] = []
    ✓ fixed:  [title] = []
    ✓ strtok: [content] = [how are you?]
    ✓ single: [content] = [how are you?]
    ✓ fixed:  [content] = [how are you?]
    ✓ strtok: [signature] = [best regards.]
    ✕ single: [signature] = [(null)]
    ✓ fixed:  [signature] = [best regards.]
    

    The correct (✓ = U+2713) and incorrect (✕ = U+2715) marks were added manually when posting the answer.

    Observe how only the lines tagged 'fixed' contain exactly what is wanted each time around.

    0 讨论(0)
提交回复
热议问题