可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I made a comment yesterday on an answer where someone had used [0123456789] in a regular expression rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set.

I decided to test that out today and found out to my surprise that (in the C# regex engine at least) \d appears to be less efficient than either of the other two which don't seem to differ much. Here is my test output over 10000 random strings of 1000 random characters with 5077 actually containing a digit:

Regular expression \d           took 00:00:00.2141226 result: 5077/10000 Regular expression [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first Regular expression [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

It's a surprise to me for two reasons:

I would have thought the range would be implemented much more efficiently than the set.
I can't understand why \d is worse than [0-9]. Is there more to \d than simply shorthand for [0-9]?

Here is the test code:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; using System.Text.RegularExpressions;  namespace SO_RegexPerformance {     class Program     {         static void Main(string[] args)         {             var rand = new Random(1234);             var strings = new List();             //10K random strings             for (var i = 0; i  strings, string regex)         {             var sw = new Stopwatch();              int successes = 0;              var rex = new Regex(regex);              sw.Start();             foreach (var str in strings)             {                 if (rex.Match(str).Success)                 {                     successes++;                 }             }             sw.Stop();              Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);              return sw.Elapsed;         }     } }

回答1:

\d checks all Unicode digits, while [0-9] is limited to these 10 characters. For example, Persian digits, , are an example of Unicode digits which are matched with \d, but not [0-9].

You can generate a list of all such characters using the following code:

var sb = new StringBuilder(); for(UInt16 i = 0; i

Which generates:

回答2:

Credit to ByteBlast for noticing this in the docs. Just changing the regex constructor:

var rex = new Regex(regex, RegexOptions.ECMAScript);

Gives new timings:

Regex \d           took 00:00:00.1355787 result: 5077/10000 Regex [0-9]        took 00:00:00.1360403 result: 5077/10000  100.34 % of first Regex [0123456789] took 00:00:00.1362112 result: 5077/10000  100.47 % of first

回答3:

From Does “\d” in regex mean a digit?:

[0-9] isn't equivalent to \d. [0-9] matches only 0123456789 characters, while \d matches [0-9] and other digit characters, for example Eastern Arabic numerals

回答4:

An addition to top answer from Sina Iravianian, here is a .NET 4.5 version (since only that version supports UTF16 output, c.f. the first three lines) of his code, using the full range of Unicode code points. Due to the lack of proper support for higher Unicode planes, many people are not aware of always checking for and including the upper Unicode planes. Nevertheless they sometimes do contain some important characters.

public static void Main() {     var unicodeEncoding = new UnicodeEncoding(!BitConverter.IsLittleEndian, false);     Console.InputEncoding = unicodeEncoding;     Console.OutputEncoding = unicodeEncoding;      var sb = new StringBuilder();     for (var codePoint = 0; codePoint

Yielding the following output:

回答5:

\d checks all Unicode, while [0-9] is limited to these 10 characters. If just 10 digits, you should use. Others I recommend using \d，Because writing less.

回答6:

\d is going to be less efficient because is has to be converted for comparison.

For example, if I wanted Regex to find IP addresses, I would rather us \d than [0123456789] or even [0-9] to represent any digit.

Generally speaking in my Regex use, function if more important than speed.

文章来源: \\d is less efficient than [0-9]

标签

less