Difference between spark Vectors and scala immutable Vector?

蹲街弑〆低调 提交于 2019-12-18 07:14:38

问题


I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors and scala.immutable.Vector that I later want to work with in my algorithm. Could someone briefly explain the difference between the two and in what situation one would be more useful to use than the other?

Thank you.


回答1:


spark.mllib.linalg.Vector is designed for linear algebra applications. mllib provides two different implementations - DenseVector, SparseVector. While you have access to useful methods like norm or sqdist it is rather limited otherwise.

As all data structures from org.apache.spark.mllib.linalg it can store only 64-bit floating point numbers (scala.Double).

If you plan to use mllib then spark.mllib.linalg.Vector is pretty much your only option. All the remaining data structures from mllib, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector.

Otherwise, scala.immutable.Vector is probably a much better choice. It is a general purpose, dense data structure.

It can store objects of any type, so you can have Vector[String] for example.

Since it is Traversable you have access to all expected methods like map, flatMap, reduce, fold, filter, etc.

Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed you may prefer breeze.linalg.Vector over spark.mllib.linalg.Vector. It supports larger set of the algebraic methods including dot product and provides typical collection API.



来源:https://stackoverflow.com/questions/31255756/difference-between-spark-vectors-and-scala-immutable-vector

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!