Spark ML Pipeline Causes java.lang.Exception: failed to compile … Code … grows beyond 64 KB
Using Spark 2.0, I am trying to run a simple VectorAssembler in a pyspark ML pipeline, like so: feature_assembler = VectorAssembler(inputCols=['category_count', 'name_count'], \ outputCol="features") pipeline = Pipeline(stages=[feature_assembler]) model = pipeline.fit(df_train) model_output = model.transform(df_train) When I try to look at the output using model_output.select("features").show(1) I get the error Py4JJavaError Traceback (most recent call last) <ipython-input-95-7a3e3d4f281c> in <module>() 2 3 ----> 4 model_output.select("features").show(1) /usr/local/spark20/python/pyspark/sql