Spark train test split

后端 未结 4 1506
难免孤独
难免孤独 2021-01-01 20:51

I am curious if there is something similar to sklearn\'s http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html for apache-spa

4条回答
  •  独厮守ぢ
    2021-01-01 21:20

    Perhaps this method wasn't available when the OP posted this question, but I'm leaving this here for future reference:

    # splitting dataset into train and test set
    (train test) = df.randomSplit([0.7, 0.3], seed=42)
    

提交回复
热议问题