I have a CSV file that is structured this way:
Header
Blank Row
\"Col1\",\"Col2\"
\"1,200\",\"1,456\"
\"2,000\",\"3,450\"
I have two proble
Try to use csv.reader with 'quotechar' parameter.It will split the line correctly. After that you can add filters as you like.
import csv
from pyspark.sql.types import StringType
df = sc.textFile("test2.csv")\
.mapPartitions(lambda line: csv.reader(line,delimiter=',', quotechar='"')).filter(lambda line: len(line)>=2 and line[0]!= 'Col1')\
.toDF(['Col1','Col2'])