When not to use RegexOptions.Compiled

前端 未结 4 1108
隐瞒了意图╮
隐瞒了意图╮ 2020-12-13 06:10

I understand the advantage of using RegexOptions.Compiled - it improves upon the execution time of app by having the regular expression in compiled form instead of interpre

4条回答
  •  温柔的废话
    2020-12-13 06:35

    From a BCL blog post, compiling increases the startup time by an order of magnitude, but decreases subsequent runtimes by about 30%. Using these numbers, compilation should be considered for a pattern that you expect to be evaluated more than about 30 times. (Of course, like any performance optimization, both alternatives should be measured for acceptability.)

    If performance is critical for a simple expression called repeatedly, you may want to avoid using regular expressions altogether. I tried running some variants about 5 million times each:

    Note: edited from previous version to correct regular expression.

        static string GetName1(string objString)
        {
            return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
        }
    
        static string GetName2(string objString)
        {
            return Regex.Replace(objString, "[^a-zA-Z&-]+", "", RegexOptions.Compiled);
        }
    
        static string GetName3(string objString)
        {
            var sb = new StringBuilder(objString.Length);
            foreach (char c in objString)
                if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '-' || c == '&')
                    sb.Append(c);
            return sb.ToString();
        }
    
    
        static string GetName4(string objString)
        {
            char[] c = objString.ToCharArray();
            int pos = 0;
            int writ = 0;
            while (pos < c.Length)
            {
                char curr = c[pos];
                if ((curr >= 'A' && curr <= 'Z') || (curr >= 'a' && curr <= 'z') || curr == '-' || curr == '&')
                {
                    c[writ++] = c[pos];
                }
                pos++;
            }
            return new string(c, 0, writ);
        }
    
    
        unsafe static string GetName5(string objString)
        {
            char* buf = stackalloc char[objString.Length];
            int writ = 0;
            fixed (char* sp = objString)
            {
                char* pos = sp;
                while (*pos != '\0')
                {
                    char curr = *pos;
                    if ((curr >= 'A' && curr <= 'Z') ||
                        (curr >= 'a' && curr <= 'z') ||
                         curr == '-' || curr == '&')
                        buf[writ++] = curr;
                    pos++;
                }
            }
            return new string(buf, 0, writ);
        }
    

    Executing independently for 5 million random ASCII strings, 30 characters each, consistently gave these numbers:

       Method 1: 32.3  seconds (interpreted regex)
       Method 2: 24.4  seconds (compiled regex)
       Method 3:  1.82 seconds (StringBuilder concatenation)
       Method 4:  1.64 seconds (char[] manipulation)
       Method 5:  1.54 seconds (unsafe char* manipulation)
    

    That is, compilation provided about a 25% performance benefit for a very large number of evaluations of this pattern, with the first execution being about 3 times slower. Methods that operated on the underlying character arrays were 12 times faster than the compiled regular expressions.

    While method 4 or method 5 may provide some performance benefit over regular expressions, the other methods may provide other benefits (maintainability, readability, flexibility, etc.). This simple test does suggest that, in this case, compiling the regex has a modest performance benefit over interpreting it for a large number of evaluations.

提交回复
热议问题