Can I read a CSV represented as a string into Apache Spark using spark-csv

后端未结

关注

 3  1707

你的背包 2020-12-05 08:57

I know how to read a csv file into spark using spark-csv (https://github.com/databricks/spark-csv), but I already have the csv file represented as a string and would like to

3条回答

悲哀的现实 (楼主)

2020-12-05 09:30

You can parse your string into a csv using, e.g. scala-csv:

val myCSVdata : Array[List[String]] = myCSVString.split('\n').flatMap(CSVParser.parseLine(_))

Here you can do a bit more processing, data cleaning, verifying that every line parses well and has the same number of fields, etc ...

You can then make this an RDD of records:

val myCSVRDD : RDD[List[String]] = sparkContext.parallelize(msCSVdata)

Here you can massage your lists of Strings into a case class, to reflect the fields of your csv data better. You should get some inspiration from the creations of Persons in this example:

https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection

I omit this step.

You can then convert to a DataFrame:

import spark.implicits._ myCSVDataframe = myCSVRDD.toDF()

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...