spark.ml StringIndexer throws 'Unseen label' on fit()
问题 I'm preparing a toy spark.ml example. Spark version 1.6.0 , running on top of Oracle JDK version 1.8.0_65 , pyspark, ipython notebook. First, it hardly has anything to do with Spark, ML, StringIndexer: handling unseen labels. The exception is thrown while fitting a pipeline to a dataset, not transforming it. And suppressing the exception might not be a solution here, since, I'm afraid, the dataset gets messed pretty bad in this case. My dataset is about 800Mb uncompressed, so it might be hard