Idiomatic Scala way of deserializing delimited strings into case classes

China☆狼群 提交于 2019-12-04 14:44:38

As there is a Shapeless tag, a possible solution using it, but I'm not an expert and I guess one can do better :

First, about the lack of validation, you should simply have read return Try, or scalaz.Validation or just option if you do not care about an error message.

Then about boilerplate, you may try to use HList. This way you don't need to go for all the arities.

import scala.util._
import shapeless._

trait Reader[+A] { self =>
  def read(s: String) : Try[A]
  def map[B](f: A => B): Reader[B] = new Reader[B] {
    def read(s: String) = self.read(s).map(f)
  }
}    

object Reader {
  // convenience
  def apply[A: Reader] : Reader[A] = implicitly[Reader[A]]
  def read[A: Reader](s: String): Try[A] = implicitly[Reader[A]].read(s)

  // base types
  implicit object StringReader extends Reader[String] {
    def read(s: String) = Success(s)
  }
  implicit object IntReader extends Reader[Int] {
    def read(s: String) = Try {s.toInt}
  }

  // HLists, parts separated by ":"
  implicit object HNilReader extends Reader[HNil] {
    def read(s: String) = 
      if (s.isEmpty()) Success(HNil) 
      else Failure(new Exception("Expect empty"))
  }
  implicit def HListReader[A : Reader, H <: HList : Reader] : Reader[A :: H] 
  = new Reader[A :: H] {
    def read(s: String) = {
      val (before, colonAndBeyond) = s.span(_ != ':')
      val after = if (colonAndBeyond.isEmpty()) "" else colonAndBeyond.tail
      for {
        a <- Reader.read[A](before)
        b <- Reader.read[H](after)
      } yield a :: b
    }
  }

}

Given that, you have a reasonably short reader for Foo :

case class Foo(a: Int, s: String) 

object Foo {
  implicit val FooReader : Reader[Foo] = 
    Reader[Int :: String :: HNil].map(Generic[Foo].from _)
}

It works :

println(Reader.read[Foo]("12:text"))
Success(Foo(12,text))

Without scalaz and shapeless, I think the ideomatic Scala way to parse some input are Scala parser combinators. In your example, I would try something like this:

import org.joda.time.DateTime
import scala.util.parsing.combinator.JavaTokenParsers

val input =
  """Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
    |RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
    |M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
    |OPC:m3node:1-10-2(P):A7:NAT0""".stripMargin

trait LineContent
case class Event(number : Int, typ : String, when : DateTime, stuff : List[String]) extends LineContent
case class Reset(node : String, stuff : List[String]) extends LineContent
case class Other(typ : String, stuff : List[String]) extends LineContent

object LineContentParser extends JavaTokenParsers {
  override val whiteSpace=""":""".r

  val space="""\s+""".r
  val lineEnd = """"\n""".r  //"""\s*(\r?\n\r?)+""".r
  val field = """[^:]*""".r

  def stuff : Parser[List[String]] = rep(field)
  def integer : Parser[Int] = log(wholeNumber ^^ {_.toInt})("integer")

  def date : Parser[DateTime] = log((repsep(integer, space)  filter (_.length == 6))  ^^ (l =>
      new DateTime(l(0), l(1), l(2), l(3), l(4), l(5), 0)
    ))("date")

  def event : Parser[Event] = "Event" ~> integer ~ field ~ date ~ stuff ^^ {
    case number~typ~when~stuff => Event(number, typ, when, stuff)}

  def reset : Parser[Reset] = "RSET" ~> field ~ stuff ^^ { case node~stuff =>
    Reset(node, stuff)
  }

  def other : Parser[Other] = ("M3UA_IP_LINK" | "OPC") ~ stuff ^^ { case typ~stuff =>
    Other(typ, stuff)
  }

  def line : Parser[LineContent] = event | reset | other
  def lines = repsep(line, lineEnd)

  def parseLines(s : String) = parseAll(lines, s)
}

LineContentParser.parseLines(input)

The patterns in the parser combinators are self explanatory. I always convert each successfully parsed chunk as early as possible to an partial result. Then the partial results will be combined to the final result.

A hint for debugging: You can always add the log parser. It will print before and after when a rule is applied. Together with the given name (e.g. "date") it will also print the current position of the input source, where the rule is applied and when applicable the parsed partial result.

An example output looks like this:

trying integer at scala.util.parsing.input.CharSequenceReader@108589b
integer --> [1.13] parsed: 5003
trying date at scala.util.parsing.input.CharSequenceReader@cec2e3
trying integer at scala.util.parsing.input.CharSequenceReader@cec2e3
integer --> [1.30] parsed: 2013
trying integer at scala.util.parsing.input.CharSequenceReader@14da3
integer --> [1.33] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1902929
integer --> [1.36] parsed: 6
trying integer at scala.util.parsing.input.CharSequenceReader@17e4dce
integer --> [1.39] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1747fd8
integer --> [1.42] parsed: 37
trying integer at scala.util.parsing.input.CharSequenceReader@1757f47
integer --> [1.45] parsed: 55
date --> [1.45] parsed: 2013-12-06T12:37:55.000+01:00

I think this is an easy and maintainable way to parse input into well typed Scala objects. It is all in the core Scala API, hence I would call it "idiomatic". When typing the example code in an Idea Scala worksheet, completion and type information worked very well. So this way seems to well supported by the IDEs.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!