How to select an exact number of random rows from DataFrame
问题 How can I select an exact number of random rows from a DataFrame efficiently? The data contains an index column that can be used. If I have to use maximum size, what is more efficient, count() or max() on the index column? 回答1: A possible approach is to calculate the number of rows using .count() , then use sample() from python 's random library to generate a random sequence of arbitrary length from this range. Lastly use the resulting list of numbers vals to subset your index column. import