My goal is to build a multicalss classifier.
I have built a pipeline for feature extraction and it includes as a first step a StringIndexer transformer to map each c
In my case, I was running spark ALS on a large data set and the data was not available at all partitions so I had to cache() the data appropriately and it worked like a charm