Better way to convert a string field into timestamp in Spark

前端 未结 7 875
独厮守ぢ
独厮守ぢ 2020-11-27 16:29

I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and

7条回答
  •  清歌不尽
    2020-11-27 17:02

    I would use https://github.com/databricks/spark-csv

    This will infer timestamps for you.

    import com.databricks.spark.csv._
    val rdd: RDD[String] = sc.textFile("csvfile.csv")
    
    val df : DataFrame = new CsvParser().withDelimiter('|')
          .withInferSchema(true)
          .withParseMode("DROPMALFORMED")
          .csvRdd(sqlContext, rdd)
    

提交回复
热议问题