Spark train test split

后端 未结 4 1516
难免孤独
难免孤独 2021-01-01 20:51

I am curious if there is something similar to sklearn\'s http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html for apache-spa

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-01 21:16

    Spark supports stratified samples as outlined in https://s3.amazonaws.com/sparksummit-share/ml-ams-1.0.1/6-sampling/scala/6-sampling_student.html

    df.stat.sampleBy("label", Map(0 -> .10, 1 -> .20, 2 -> .3), 0)
    

提交回复
热议问题