I am curious if there is something similar to sklearn\'s http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html for apache-spa
Perhaps this method wasn't available when the OP posted this question, but I'm leaving this here for future reference:
# splitting dataset into train and test set (train test) = df.randomSplit([0.7, 0.3], seed=42)