method for expand a-z to abc…xyz form

前端 未结 4 774
孤街浪徒
孤街浪徒 2020-12-11 10:19

Hi:) what i\'m trying to do is write a simple program to expand from shortest entry

for example

a-z or 0-9 or a-b-c or a-z0-9

4条回答
  •  抹茶落季
    2020-12-11 10:41

    Here is a C version (in about 38 effective lines) that satisfies the same test as my earlier C++ version.

    The full test program including your test cases, mine and some torture test can be seen live on http://ideone.com/sXM7b#info_3915048

    Rationale

    I'm pretty sure I'm overstating the requirements, but

    • this should be an excellent example of how to do parsing in a robust fashion
      • use states in an explicit fashion
      • validate input (!)
        • this version doesn't assume a-c-b can't happen
        • It also doesn't choke or even fail on simple input like 'Hello World' (or (char*) 0)
    • it shows how you can avoid printf("%c", c) each char without using extraneous functions.
    • I put in some comments as to explain what happens why, but overall you'll find that the code is much more legible anyways, by

      • staying away from too many short-named variables
      • avoiding complicated conditionals with un-transparent indexers
      • avoiding the whole string length business: We only need max lookahead of 2 characters, and *it=='-' or predicate(*it) will just return false if it is the null character. Shortcut evaluation prevents us from accessing past-the-end input characters
    • ONE caveat: I haven't implemented a proper check for output buffer overrun (the capacity is hardcoded at 2048 chars). I'll leave it as the proverbial exercise for the reader

    Last but not least, the reason I did this:

    • It will allow me to compare raw performance of the C++ version and this C version, now that they perform equivalent functions. Right now, I fully expect the C version to outperform the C++ by some factor (let's guess: 4x?) but, again, let's just see what suprises the GNU compilers have in store for us. More later Update turns out I wasn't far off: github (code + results)

    Pure C Implementation

    Without further ado, the implementation, including the testcase:

    #include 
    #include 
    #include 
    
    int alpha_range(char c) { return (c>='a') && (c<='z'); }
    int digit_range(char c) { return (c>='0') && (c<='9'); }
    
    char* expand(const char* s)
    {
        char buf[2048];
    
        const char* in  = s;
              char* out = buf;
    
        // parser state
        int (*predicate)(char) = 0; // either: NULL (free state), alpha_range (in alphabetic range), digit_range (in digit range)
        char lower=0,upper=0;       // tracks lower and upper bound of character ranges in the range parsing states
    
        // init
        *out = 0;
    
        while (*in)
        {
            if (!predicate)
            {
                // free parsing state
                if (alpha_range(*in) && (in[1] == '-') && alpha_range(in[2]))
                {
                    lower = upper = *in++;
                    predicate = &alpha_range;
                }
                else if (digit_range(*in) && (in[1] == '-') && digit_range(in[2]))
                {
                    lower = upper = *in++;
                    predicate = &digit_range;
                } 
                else *out++ = *in;
            } else
            { 
                // in a range
                if (*in < lower) lower = *in;
                if (*in > upper) upper = *in;
    
                if (in[1] == '-' && predicate(in[2])) 
                    in++; // more coming
                else
                {
                    // end of range mode, dump expansion
                    char c;
                    for (c=lower; c<=upper; *out++ = c++);
                    predicate = 0;
                }
            }
            in++;
        }
    
        *out = 0; // null-terminate buf
        return strdup(buf);
    }
    
    void dotest(const char* const input)
    {
        char* ex = expand(input);
        printf("input : '%s'\noutput: '%s'\n\n", input, ex);
    
        if (ex)
            free(ex);
    }
    
    int main (int argc, char *argv[])
    {
        dotest("a-z or 0-9 or a-b-c or a-z0-9"); // from the original post
        dotest("This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6"); // from my C++ answer
        dotest("-x-s a-9 9- a-k-9 9-a-c-7-3"); // assorted torture tests
    
        return 0;
    }
    

    Test output:

    input : 'a-z or 0-9 or a-b-c or a-z0-9'
    output: 'abcdefghijklmnopqrstuvwxyz or 0123456789 or abc or abcdefghijklmnopqrstuvwxyz0123456789'
    
    input : 'This is some e-z test in 5-7 steps; this works: a-b-c. This works too: b-k-c-e. Likewise 8-4-6'
    output: 'This is some efghijklmnopqrstuvwxyz test in 567 steps; this works: abc. This works too: bcdefghijk. Likewise 45678'
    
    input : '-x-s a-9 9- a-k-9 9-a-c-7-3'
    output: '-stuvwx a-9 9- abcdefghijk-9 9-abc-34567'
    

提交回复
热议问题