Scala Split Seq or List by Delimiter

匿名 (未验证) 提交于 2019-12-03 01:33:01

问题:

Let's say I have a sequence of ints like this:

val mySeq = Seq(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2)

I want to split this by let's say 0 as a delimiter to look like this:

val mySplitSeq = Seq(Seq(0, 1, 2, 1), Seq(0, -1), Seq(0, 1, 2, 3, 2))

What is the most elegant way to do this in Scala?

回答1:

This works alright

mySeq.foldLeft(Vector.empty[Vector[Int]]) {   case (acc, i) if acc.isEmpty => Vector(Vector(i))   case (acc, 0) => acc :+ Vector(0)   case (acc, i) => acc.init :+ (acc.last :+ i) } 

where 0 (or whatever) is your delimiter.



回答2:

Efficient O(n) solution

Tail-recursive solution that never appends anything to lists:

def splitBy[A](sep: A, seq: List[A]): List[List[A]] = {   @annotation.tailrec   def rec(xs: List[A], revAcc: List[List[A]]): List[List[A]] = xs match {     case Nil => revAcc.reverse     case h :: t =>        if (h == sep) {         val (pref, suff) = xs.tail.span(_ != sep)         rec(suff, (h :: pref) :: revAcc)       } else {         val (pref, suff) = xs.span(_ != sep)         rec(suff, pref :: revAcc)       }   }   rec(seq, Nil) }  val mySeq = List(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2) println(splitBy(0, mySeq)) 

produces:

List(List(0, 1, 2, 1), List(0, -1), List(0, 1, 2, 3, 2)) 

It also handles the case where the input does not start with the separator.


For fun: Another O(n) solution that works for small integers

This is more of warning rather than a solution. Trying to reuse String's split does not result in anything sane:

val mySeq = Seq(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2) val z = mySeq.min val res = (mySeq   .map(x => (x - z).toChar)   .mkString   .split((-z).toChar)   .map(s => 0 :: s.toList.map(_.toInt + z) ).toList.tail) 

It will fail if the integers span a range larger than 65535, and it looks pretty insane. Nevertheless, I find it amusing that it works at all:

res: List[List[Int]] = List(List(0, 1, 2, 1), List(0, -1), List(0, 1, 2, 3, 2)) 


回答3:

You can use foldLeft:

val delimiter = 0  val res = mySeq.foldLeft(Seq[Seq[Int]]()) {   case (acc, `delimiter`) => acc :+ Seq(delimiter)   case (acc, v) => acc.init :+ (acc.last :+ v) } 

NOTE: This assumes input necessarily starts with delimiter.



回答4:

One more variant using indices and reverse slicing

scala> val s = Seq(0,1, 2, 1, 0, -1, 0, 1, 2, 3, 2) s: scala.collection.mutable.Seq[Int] = ArrayBuffer(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2)  scala> s.indices.filter( s(_)==0).+:(if(s(0)!=0) -1 else -2).filter(_>= -1 ).reverse.map( {var p=0; x=>{ val y=s.slice(x,s.size-p);p=s.size-x;y}}).reverse res173: scala.collection.immutable.IndexedSeq[scala.collection.mutable.Seq[Int]] = Vector(ArrayBuffer(0, 1, 2, 1), ArrayBuffer(0, -1), ArrayBuffer(0, 1, 2, 3, 2)) 

if the starting doesn't have the delimiter, then also it works.. thanks to jrook

scala>  val s = Seq(1, 2, 1, 0, -1, 0, 1, 2, 3, 2) s: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 1, 0, -1, 0, 1, 2, 3, 2)  scala> s.indices.filter( s(_)==0).+:(if(s(0)!=0) -1 else -2).filter(_>= -1 ).reverse.map( {var p=0; x=>{ val y=s.slice(x,s.size-p);p=s.size-x;y}}).reverse res174: scala.collection.immutable.IndexedSeq[scala.collection.mutable.Seq[Int]] = Vector(ArrayBuffer(1, 2, 1), ArrayBuffer(0, -1), ArrayBuffer(0, 1, 2, 3, 2)) 

UPDATE1:

More compact version by removing the "reverse" in above

scala> val s = Seq(0,1, 2, 1, 0, -1, 0, 1, 2, 3, 2) s: scala.collection.mutable.Seq[Int] = ArrayBuffer(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2)  scala> s.indices.filter( s(_)==0).+:(if(s(0)!=0) -1 else -2).filter(_>= -1 ).:+(s.size).sliding(2,1).map( x=>s.slice(x(0),x(1)) ).toList res189: List[scala.collection.mutable.Seq[Int]] = List(ArrayBuffer(0, 1, 2, 1), ArrayBuffer(0, -1), ArrayBuffer(0, 1, 2, 3, 2))  scala> val s = Seq(1, 2, 1, 0, -1, 0, 1, 2, 3, 2) s: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 1, 0, -1, 0, 1, 2, 3, 2)  scala> s.indices.filter( s(_)==0).+:(if(s(0)!=0) -1 else -2).filter(_>= -1 ).:+(s.size).sliding(2,1).map( x=>s.slice(x(0),x(1)) ).toList res190: List[scala.collection.mutable.Seq[Int]] = List(ArrayBuffer(1, 2, 1), ArrayBuffer(0, -1), ArrayBuffer(0, 1, 2, 3, 2))  scala> 


回答5:

Here is a solution I believe is both short and should run in O(n):

def seqSplitter[T](s: ArrayBuffer[T], delimiter : T) =    (0 +: s.indices.filter(s(_)==delimiter) :+ s.size)  //find split locations   .sliding(2)   .map(idx => s.slice(idx.head, idx.last)) //extract the slice   .dropWhile(_.isEmpty) //take care of the first element   .toList 

The idea is to take all the indices where the delimiter occurs, slide over them and slice the sequence at those locations. dropWhile takes care of the first element being a delimiter or not.

Here I am putting all the data in an ArrayBuffer to ensure slicing will take O(size_of_slice).

val mySeq = ArrayBuffer(0, 1, 2, 1, 0, -1, 0, 1, 2, 3, 2) seqSplitter(mySeq, 0).toList 

Gives:

List(ArrayBuffer(0, 1, 2, 1), ArrayBuffer(0, -1), ArrayBuffer(0, 1, 2, 3, 2)) 

A more detailed complexity analysis

The operations are:

  • Filter the delimiter indices (O(n))
  • loop over a list of indices obtained from previous step (O(num_of_delimeters)); for each pair of indices corresponding to a slice:
    • Copy the slice from the array and put it into the final collection (O(size_of_slice))

The last two steps sum up to O(n).



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!