发表新帖

发表新帖

How to skip lines while reading a CSV file as a dataFrame using PySpark?

前端未结

关注

 5  917

执念已碎 2020-12-11 16:35

I have a CSV file that is structured this way:

Header
Blank Row
\"Col1\",\"Col2\"
\"1,200\",\"1,456\"
\"2,000\",\"3,450\"

I have two proble

5条回答

悲哀的现实 (楼主)

2020-12-11 16:58
Why don't you just try the DataFrameReader API from pyspark.sql? It is pretty easy. For this problem, I guess this single line would be good enough.
```
df = spark.read.csv("myFile.csv") # By default, quote char is " and separator is ','
```
With this API, you can also play around with few other parameters like header lines, ignoring leading and trailing whitespaces. Here is the link: DataFrameReader API
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题