How to export data from Spark SQL to CSV

前端未结

关注

 7  1248

This command works with HiveQL:

insert overwrite directory \'/data/home.csv\' select * from testtable;

But with Spark SQL I\'m getting an e

相关标签:

7条回答

离开以前

2020-12-04 16:01

You can use below statement to write the contents of dataframe in CSV format df.write.csv("/data/home/csv")

If you need to write the whole dataframe into a single CSV file, then use df.coalesce(1).write.csv("/data/home/sample.csv")

For spark 1.x, you can use spark-csv to write the results into CSV files

Below scala snippet would help

import org.apache.spark.sql.hive.HiveContext
// sc - existing spark context
val sqlContext = new HiveContext(sc)
val df = sqlContext.sql("SELECT * FROM testtable")
df.write.format("com.databricks.spark.csv").save("/data/home/csv")

To write the contents into a single file

import org.apache.spark.sql.hive.HiveContext
// sc - existing spark context
val sqlContext = new HiveContext(sc)
val df = sqlContext.sql("SELECT * FROM testtable")
df.coalesce(1).write.format("com.databricks.spark.csv").save("/data/home/sample.csv")

0 讨论(0)

栀梦

2020-12-04 16:02
Since Spark 2.X spark-csv is integrated as native datasource. Therefore, the necessary statement simplifies to (windows)
```
df.write
  .option("header", "true")
  .csv("file:///C:/out.csv")
```
or UNIX
```
df.write
  .option("header", "true")
  .csv("/var/out.csv")
```
Notice: as the comments say, it is creating the directory by that name with the partitions in it, not a standard CSV file. This, however, is most likely what you want since otherwise your either crashing your driver (out of RAM) or you could be working with a non distributed environment.
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-04 16:03

The error message suggests this is not a supported feature in the query language. But you can save a DataFrame in any format as usual through the RDD interface (df.rdd.saveAsTextFile). Or you can check out https://github.com/databricks/spark-csv.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-04 16:05
enter code here IN DATAFRAME:
```
val p=spark.read.format("csv").options(Map("header"->"true","delimiter"->"^")).load("filename.csv")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-04 16:09
The answer above with spark-csv is correct but there is an issue - the library creates several files based on the data frame partitioning. And this is not what we usually need. So, you can combine all partitions to one:
```
df.coalesce(1).
    write.
    format("com.databricks.spark.csv").
    option("header", "true").
    save("myfile.csv")
```
and rename the output of the lib (name "part-00000") to a desire filename.

This blog post provides more details: https://fullstackml.com/2015/12/21/how-to-export-data-frame-from-apache-spark/
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2020-12-04 16:09

With the help of spark-csv we can write to a CSV file.

val dfsql = sqlContext.sql("select * from tablename")
dfsql.write.format("com.databricks.spark.csv").option("header","true").save("output.csv")`

0 讨论(0)

1 2 下一页