Data in my first RDD is like
1253
545553
12344896
1 2 1
1 43 2
1 46 1
1 53 2
Now the first 3 integers are some counters that I need to bro
In my case I have a csv file like below
----- HEADER START -----
We love to generate headers
#who needs comment char?
----- HEADER END -----
colName1,colName2,...,colNameN
val__1.1,val__1.2,...,val__1.N
Took me a day to figure out
val rdd = spark.read.textFile(pathToFile) .rdd
.zipWithIndex() // get tuples (line, Index)
.filter({case (line, index) => index > numberOfLinesToSkip})
.map({case (line, index) => l}) //get rid of index
val ds = spark.createDataset(rdd) //convert rdd to dataset
val df=spark.read.option("inferSchema", "true").option("header", "true").csv(ds) //parse csv
Sorry code in scala, however can be easily converted to python