Regex performances between languages or libraries

天大地大妈咪最大 提交于 2019-12-10 16:39:47

问题


I couldn't find anything about this subject, so I wonder if anyone has compared the speed of regex matching among different languages. I would like to know which language proceeds regex evaluations faster because in my current project, I need to evaluate an enormous amount of regular expressions constantly. The choice of the language will be mainly based on this performance.

My idea is that C/C++ will be naturally faster but I want to avoid it if possible, and I'm not sure if I'm right. For example a C# library may use native code with P/Invoke and so the speed difference may be ridiculous. But I don't know what library to choose, or if I need to create a wrapper around a C++ library (which one?).


回答1:


What kind of regexes? Will they use features like lookaheads, lookbehinds, backreferences, reluctant quantifiers, atomic groups, possessive quantifiers, etc., etc.?

Other responders have linked to the regex-dna benchmark, but it only uses the most basic features shared by all regex flavors, like the Kleene star (*) and alternation (|). So, while the GNU C/C++ implementations seem to be the clear winners, they won't do you any good if you need any of the features I listed above.

Another consideration is Unicode support. If you're dealing with actual text (and not data represented as text, like in the regex-dna benchmark), you should use a regex flavor with good Unicode support.

I suggest you look into C#. The .NET regex flavor does not have a reputation for being slow (which is the only sensible thing you can say about regex speeds IMO), and for performance-critical applications it provides the option of compiling directly to byte code for a substantial performance boost.




回答2:


There is a regex benchmark here: http://shootout.alioth.debian.org/u64q/benchmark.php?test=regexdna&lang=all&box=1

But the types of regex you are going to be using could potentially matter a lot more than your choice of engine. Some engines do better than others for certain types, and some types of regex are slow no matter what the engine (e.g. certain regex can necessitate exponential time)




回答3:


I will suggest evaluating a complex Regular Expression in RegExBuddy .
Try in languages you want to Test . It shows speed in ms. Believe me , it's a great tool .




回答4:


The choice of the language will be mainly based on this performance.

Then your choice may come down to choice of regex engine.

Will your program run on single core machines or multi core, or x86 or x64?



来源:https://stackoverflow.com/questions/5127089/regex-performances-between-languages-or-libraries

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!