I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I end up just taking the first df that is returned by this fu
randomSplit
Limit is very simple, example limit first 50 rows
val df_subset = data.limit(50)