Process CSV from REST API into Spark

妖精的绣舞 提交于 2020-04-21 05:55:23

问题


What is the best way to read a csv formatted result from a rest api directly into spark?

Basically have this which I know I can process in scala and save to a file but would like to process the data in spark:

val resultCsv = scala.io.Source.fromURL(url).getLines()

回答1:


This is how it can be done.

For Spark 2.2.x

import scala.io.Source._
import org.apache.spark.sql.{Dataset, SparkSession}

var res = fromURL(url).mkString.stripMargin.lines.toList
val csvData: Dataset[String] = spark.sparkContext.parallelize(res).toDS()

val frame = spark.read.option("header", true).option("inferSchema",true).csv(csvData)
frame.printSchema()

using databricks lib for older version of Spark

import scala.io.Source._
import com.databricks.spark.csv.CsvParser

var res = fromURL(url).mkString.stripMargin.lines.toList
val csvData: Dataset[String] = spark.sparkContext.parallelize(res)

val csvParser = new CsvParser()
  .withUseHeader(true)
  .withInferSchema(true)

val frame: DataFrame = csvParser.csvRdd(sqlContext, rdd)
frame.printSchema()

Note:- I am new to Scala any improvements will be appreciated.

ref: here



来源:https://stackoverflow.com/questions/44961433/process-csv-from-rest-api-into-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!