Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

前端 未结 2 481
失恋的感觉
失恋的感觉 2020-12-10 12:15

I\'m writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.

For example, there

相关标签:
2条回答
  • 2020-12-10 12:41

    The spark mllib guide says:

    spark.mllib contains the original API built on top of RDDs.

    spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

    and

    Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.

    I think the doc explains it very well.

    0 讨论(0)
  • 2020-12-10 12:51

    It's JIRA ticket

    And From Design Doc:

    MLlib now covers a basic selection of machine learning algorithms, e.g., logistic regression, decision trees, alternating least squares, and k-means. The current set of APIs contains several design flaws that prevent us moving forward to address practical machine learning pipelines, make MLlib itself a scalable project.

    The new set of APIs will live under org.apache.spark.ml, and o.a.s.mllib will be deprecated once we migrate all features to o.a.s.ml.

    0 讨论(0)
提交回复
热议问题