Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

前端 未结 2 483
失恋的感觉
失恋的感觉 2020-12-10 12:15

I\'m writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.

For example, there

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-10 12:41

    The spark mllib guide says:

    spark.mllib contains the original API built on top of RDDs.

    spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

    and

    Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.

    I think the doc explains it very well.

提交回复
热议问题