问题
I'm writing a DSL using Scala's parser combinators. I have recently changed my base class from StandardTokenParsers to JavaTokenParsers to take advantage of the regex features I think I need for one last piece of the puzzle. (see Parsing a delimited multiline string using scala StandardTokenParser)
What I am trying to do is to extract a block of text delimited by some characters ({{ and }} in this example). This block of text can span multiple lines. What I have so far is:
def docBlockRE = regex("""(?s)(?!}}).*""".r)
def docBlock: Parser[DocString] =
"{{" ~> docBlockRE <~ "}}" ^^ { case str => new DocString(str) }}
where DocString is a case class in my DSL. However, this doesn't work. It fails if I feed it the following:
{{
abc
}}
{{
abc
}}
I'm not sure why this fails. If I put a Deubg wrapper around have a debug wrapper around the parser (http://jim-mcbeath.blogspot.com/2011/07/debugging-scala-parser-combinators.html) I get the following:
docBlock.apply for token
at position 10.2 offset 165 returns [19.1] failure: `}}' expected but end of source found
If I try a single delimited block with multiple lines:
{{
abc
def
}}
then it also fails to parse with:
docBlock.apply for token
at position 10.2 offset 165 returns [16.1] failure: `}}' expected but end of source found
If I remove the DOTALL directive (?s) then I can parse multiple single-line blocks (which doesn't really help me much).
Is there any way of combining multi-line regex with negative lookahead?
One other issue I have with this approach is that, no matter what I do, the closing delimiter must be on a separate line from the text. Otherwise I get the same kind of error message I see above. It is almost like the negative lookahead isn't really working as I expect it to.
回答1:
In context:
scala> val rr = """(?s).*?(?=}})""".r
rr: scala.util.matching.Regex = (?s).*?(?=}})
scala> object X extends JavaTokenParsers {val r: Parser[String] = rr; val b: Parser[String] = "{{" ~>r<~"}}" ^^ { case s => s } }
defined object X
scala> X.parseAll(X.b, """{{ abc
| def
| }}""")
res15: X.ParseResult[String] =
[3.3] parsed: abc
def
More to show difference in greed:
scala> val rr = """(?s)(.*?)(?=}})""".r.unanchored
rr: scala.util.matching.UnanchoredRegex = (?s)(.*?)(?=}})
scala> def f(s: String) = s match { case rr(x) => x case _ => "(none)" }
f: (s: String)String
scala> f("something }} }}")
res3: String = "something "
scala> val rr = """(?s)(.*)(?=}})""".r.unanchored
rr: scala.util.matching.UnanchoredRegex = (?s)(.*)(?=}})
scala> def f(s: String) = s match { case rr(x) => x case _ => "(none)" }
f: (s: String)String
scala> f("something }} }}")
res4: String = "something }} "
The lookahead just means "make sure this follows me, but don't consume it."
Negative lookahead just means make sure it doesn't follow me.
回答2:
To match {{the entire bracket}}, use this regex:
(?s)\{\{.*?\}\}
See the matches in the demo.
To match {{inside the brackets}}, use this:
(?s)(?<=\{\{).*?(?=\}\})
See the matches in the demo.
Explanation
(?s)activatesDOTALLmode, allowing the dot to match across lines- The star quantifier in
.*?is made "lazy" by the?so that the dot only matches as much as necessary. Without the?, the.*will grab the longest match, first matching the whole string then backtracking only as far as needed to allow the next token to match. (?<=\{\{)is a lookbehind that asserts that what precedes is{{(?=\}\})is a lookahead that asserts that what follows is}}
Reference
- Lookahead and Lookbehind Zero-Length Assertions
- Mastering Lookahead and Lookbehind
来源:https://stackoverflow.com/questions/24771341/scala-regex-multiline-match-with-negative-lookahead