Scala parser combinators for language embedded in html or text (like php)

我与影子孤独终老i 提交于 2019-12-10 11:07:38

问题


I have been playing around with Scala parser combinators for some time now, and learned some of the ways to make it behave nicely and do the most of the things I want, using the built in function.

But how do you make an embedded language (like php or ruby's erb)? It requires whitespace to not be ignored, outside the embedding of real code.

I managed to make a simple parser that matches all text up to a given regex match, but I am looking for a better, prettier way of doing this. There is propably some already defined function that does the stuff needed.

The test language parses text like:

now: [[ millis; ]]
and now: [[; millis; ]]

and is generated by the following code:

package test

import scala.util.parsing.combinator.RegexParsers
import scala.util.matching.Regex

sealed abstract class Statement
case class Print(s: String) extends Statement
case class Millis() extends Statement

object SimpleLang extends RegexParsers {

  def until(r: Regex): Parser[String] = new Parser[String]{
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = offset
      (r.findFirstMatchIn( source.subSequence(offset, source.length) )) match {
        case Some(matched) => 
          Success(source.subSequence(offset, offset + matched.start).toString, in.drop(matched.start))
        case None => 
          Failure("string matching regex `"+ r +"' expected but `"+ in.first +"' found", in.drop(0))
      }
    }
  }

  def until(s: String): Parser[String] = until(java.util.regex.Pattern.quote(s).r)

  def interpret(stats: List[Statement]): Unit = stats match {
    case Print(s) :: rest => {
      print(s)
      interpret(rest)
    }
    case Millis() :: rest => {
      print(System.currentTimeMillis)
      interpret(rest)
    }
    case Nil => ()
  }

  def apply(input: String) : List[Statement] = parseAll(beginning, input) match {
    case Success(tree,_) => tree
    case e: NoSuccess => throw new RuntimeException("Syntax error: " + e)
  }

  /** GRAMMAR **/

  def beginning = (
    "[[" ~> stats |
    until("[[") ~ "[[" ~ stats ^^ { 
      case s ~ _ ~ ss => Print(s) :: ss
    }
  )

  def stats = rep1sep(stat, ";")

  def stat = (
    "millis" ^^^ { Millis() } |
    "]]" ~> ( (until("[[") <~ "[[") | until("\\z".r)) ^^ {
      case s => Print(s)
    }
  )

  def main(args: Array[String]){
    val tree = SimpleLang("now: [[ millis; ]]\nand now: [[; millis; ]]")
    println(tree)
    interpret(tree)
  }

}

回答1:


Scala's RegexParsers trait provides an implicit conversion from Regex to Parser[Char] which skips any leading whitespace before checking for a regex match. You can use

override val skipWhitespace = false

to turn this behavior off, or override the whiteSpace member (it's another regex) to provide your own custom string.

These options work globally, turning off the whitespace-skipping means that ALL regex productions will see the whitespace.

Another option would be to avoid using the regex conversion for just a few cases where you need whitespace. I've done so here in a parser for CSS which ignores comments in most places, but just before a rule it needs to read them to extract some javadoc-style metadata.




回答2:


Have you considered using a lexer before the parser?



来源:https://stackoverflow.com/questions/3347552/scala-parser-combinators-for-language-embedded-in-html-or-text-like-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!