Get CSV to Spark dataframe

前端 未结 9 1302
忘了有多久
忘了有多久 2020-12-05 14:45

I\'m using python on Spark and would like to get a csv into a dataframe.

The documentation for Spark SQL strangely does not provide explanations for CSV as a source.

9条回答
  •  生来不讨喜
    2020-12-05 14:50

    I ran into similar problem. The solution is to add an environment variable named as "PYSPARK_SUBMIT_ARGS" and set its value to "--packages com.databricks:spark-csv_2.10:1.4.0 pyspark-shell". This works with Spark's Python interactive shell.

    Make sure you match the version of spark-csv with the version of Scala installed. With Scala 2.11, it is spark-csv_2.11 and with Scala 2.10 or 2.10.5 it is spark-csv_2.10.

    Hope it works.

提交回复
热议问题