Lua pattern matching vs. regular expressions

我怕爱的太早我们不能终老 提交于 2020-01-09 06:04:14

问题


I'm currently learning lua. regarding pattern-matching in lua I found the following sentence in the lua documentation on lua.org:

Nevertheless, pattern matching in Lua is a powerful tool and includes some features that are difficult to match with standard POSIX implementations.

As I'm familiar with posix regular expressions I would like to know if there are any common samples where lua pattern matching is "better" compared to regular expression -- or did I misinterpret the sentence? and if there are any common examples: why is any of pattern-matching vs. regular expressions better suited?


回答1:


Are any common samples where lua pattern matching is "better" compared to regular expression?

It is not so much particular examples as that Lua patterns have a higher signal-to-noise ratio than POSIX regular expressions. It is the overall design that is often preferable, not particular examples.

Here are some factors that contribute to the good design:

  • Very lightweight syntax for matching common character types including uppercase letters (%u), decimal digits (%d), space characters (%s) and so on. Any character type can be complemented by using the corresponding capital letter, so pattern %S matches any nonspace character.

  • Quoting is extremely simple and regular. The quoting character is %, so it is always distinct from the string-quoting character \, which makes Lua patterns much easier to read than POSIX regular expressions (when quoting is necessary). It is always safe to quote symbols, and it is never necessary to quote letters, so you can just go by that rule of thumb instead of memorizing what symbols are special metacharacters.

  • Lua offers "captures" and can return multiple captures as the result of a match call. This interface is much, much better than capturing substrings through side effects or having some hidden state that has to be interrogated to find captures. Capture syntax is simple: just use parentheses.

  • Lua has a "shortest match" - modifier to go along with the "longest match" * operator. So for example s:find '%s(%S-)%.' finds the shortest sequence of nonspace characters that is preceded by space and followed by a dot.

  • The expressive power of Lua patterns is comparable to POSIX "basic" regular expressions, without the alternation operator |. What you are giving up is "extended" regular expressions with |. If you need that much expressive power I recommend going all the way to LPEG which gives you essentially the power of context-free grammars at quite reasonable cost.




回答2:


http://lua-users.org/wiki/LibrariesAndBindings contains a listing of functionality including regex libraries if you wish to continue using them.

To answer the question (and note that I'm by no means a Lua guru), the language has a strong tradition of being used in embedded applications, where a full regex engine would unduly increase the size of the code being used on the platform, sometimes much larger than just all of the Lua library itself.

[Edit] I just found in the online version of Programming in Lua (an excellent resource for learning the language) where this is described by one of the principles of the language: see the comments below [/Edit]

I find personally that the default pattern matching Lua provides satisfies most of my regex-y needs. Your mileage may vary.




回答3:


Ok, just a slight noob note for this discussion; I particularly got confused by this page:

SciTE Regular Expressions

since that one says \s matches whitespace, as I know from other regular expression syntaxes... And so I'm trying it in a shell:

$ lua
Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> c="   d"
> print(c:match(" "))

> print(c:match("."))

> print(c:match("\s"))
nil
> print("_".. c:match("[ ]") .."_")
_ _
> print("_".. c:match("[ ]*") .."_")
_   _
> print("_".. c:match("[\s]*") .."_")
__

Hmmm... seems \s doesn't get recognized here - so that page probably refers to the regular expression in Scite's Find/Replace - not to Lua's regex syntax (which scite also uses).

Then I reread lua-users wiki: Patterns Tutorial, and start getting the comment about the escape character being %, not \ in @NormanRamsey's answer. So, trying this:

> print("_".. c:match("[%s]*") .."_")
_   _

... does indeed work.

So, as I originally thought that Lua's "patterns" are different commands/engine from Lua's "regular expression", I guess a better way to say it is: Lua's "patterns" are the Lua-specific "regular expression" syntax/engine (in other words, there aren't two of them :) )

Cheers!



来源:https://stackoverflow.com/questions/2693334/lua-pattern-matching-vs-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!