I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and
I would use https://github.com/databricks/spark-csv
This will infer timestamps for you.
import com.databricks.spark.csv._
val rdd: RDD[String] = sc.textFile("csvfile.csv")
val df : DataFrame = new CsvParser().withDelimiter('|')
.withInferSchema(true)
.withParseMode("DROPMALFORMED")
.csvRdd(sqlContext, rdd)