How to Remove header and footer from Dataframe?

后端未结

关注

 4  996

小蘑菇 2021-01-24 07:23

I am reading a text (not CSV) file that has header, content and footer using

spark.read.format(\"text\").option(\"delimiter\",\"|\")...load(file)

4条回答

北荒 (楼主)

2021-01-24 08:24

Sample data:

col1|col2|col3
100|hello|asdf
300|hi|abc
200|bye|xyz
800|ciao|qwerty
This is the footer line

Processing logic:

#load text file
txt = sc.textFile("path_to_above_sample_data_text_file.txt")

#remove header
header = txt.first()
txt = txt.filter(lambda line: line != header)

#remove footer
txt = txt.map(lambda line: line.split("|"))\
    .filter(lambda line: len(line)>1)

#convert to dataframe
df=txt.toDF(header.split("|"))
df.show()

Output is:

+----+-----+------+
|col1| col2|  col3|
+----+-----+------+
| 100|hello|  asdf|
| 300|   hi|   abc|
| 200|  bye|   xyz|
| 800| ciao|qwerty|
+----+-----+------+

Hope this helps!

0 讨论(0)

查看其它4个回答