use an external library in pyspark job in a Spark cluster from google-dataproc

后端 未结 2 1306
我在风中等你
我在风中等你 2020-12-09 06:36

I have a spark cluster I created via google dataproc. I want to be able to use the csv library from databricks (see https://github.com/databricks/spark-csv). So I f

2条回答
  •  臣服心动
    2020-12-09 07:13

    Additionally to @Dennis.

    Note that if you need to load multiple external packages, you need to specify a custom escape character like so:

    --properties ^#^spark.jars.packages=org.elasticsearch:elasticsearch-spark_2.10:2.3.2,com.data‌​bricks:spark-avro_2.10:2.0.1
    

    Note the ^#^ right before the package list. See gcloud topic escaping for more details.

提交回复
热议问题