I\'m trying to use sklearn with pyspark but I\'m having some performance issues. Lets say I have a dataset that have already went through a pipeline where features were vectoriz