发表新帖

发表新帖

Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

前端未结

关注

 2  482

失恋的感觉

I\'m writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.

For example, there

相关标签:

2条回答

陌清茗

2020-12-10 12:41

The spark mllib guide says:

spark.mllib contains the original API built on top of RDDs.

spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

and

Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.

I think the doc explains it very well.

0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2020-12-10 12:51

It's JIRA ticket

And From Design Doc:

MLlib now covers a basic selection of machine learning algorithms, e.g., logistic regression, decision trees, alternating least squares, and k-means. The current set of APIs contains several design flaws that prevent us moving forward to address practical machine learning pipelines, make MLlib itself a scalable project.

The new set of APIs will live under org.apache.spark.ml, and o.a.s.mllib will be deprecated once we migrate all features to o.a.s.ml.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题