Scala : How to split words using multiple delimeters

霸气de小男生 提交于 2020-08-07 05:41:45

问题


Suppose I have the text file like this:

Apple#mango&banana@grapes

The data needs to be split on multiple delimiters before performing the word count.

How to do that?


回答1:


Use split method:

scala> "Apple#mango&banana@grapes".split("[#&@]")
res0: Array[String] = Array(Apple, mango, banana, grapes)



回答2:


If you just want to count words, you don't need to split. Something like this will do:

  val numWords = """\b\w""".r.findAllIn(string).length

This is a regex that matches start of a word (\b is a (zero-length) word boundary, \w is any "word" character (letter, number or underscore), so you get all the matches in your string, and then just check how many there are.

If you are looking to count each word separately, and do it across multiple lines, then, split is, probably, a better option:

    source
      .getLines
      .flatMap(_.split("\\W+"))
      .filterNot(_.isEmpty)
      .groupBy(identity)
      .mapValues(_.size)


来源:https://stackoverflow.com/questions/45758378/scala-how-to-split-words-using-multiple-delimeters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!