Counting words after grouping records
问题 Note: Although the provided answer is working, it can get rather slow on larger data sets. Take a look at this for a faster solution. I am having a data frame which consists of labelled document such as this one: df_ = spark.createDataFrame([ ('1', 'hello how are are you today'), ('1', 'hello how are you'), ('2', 'hello are you here'), ('2', 'how is it'), ('3', 'hello how are you'), ('3', 'hello how are you'), ('4', 'hello how is it you today') ], schema=['label', 'text']) What I want is to