I am reading a text (not CSV) file that has header, content and footer using
spark.read.format(\"text\").option(\"delimiter\",\"|\")...load(file)
Sample data:
col1|col2|col3
100|hello|asdf
300|hi|abc
200|bye|xyz
800|ciao|qwerty
This is the footer line
Processing logic:
#load text file
txt = sc.textFile("path_to_above_sample_data_text_file.txt")
#remove header
header = txt.first()
txt = txt.filter(lambda line: line != header)
#remove footer
txt = txt.map(lambda line: line.split("|"))\
.filter(lambda line: len(line)>1)
#convert to dataframe
df=txt.toDF(header.split("|"))
df.show()
Output is:
+----+-----+------+
|col1| col2| col3|
+----+-----+------+
| 100|hello| asdf|
| 300| hi| abc|
| 200| bye| xyz|
| 800| ciao|qwerty|
+----+-----+------+
Hope this helps!