问题
I have yaml file with following details. file name : config.yml
- firstName: "James"
lastName: "Bond"
age: 30
- firstName: "Super"
lastName: "Man"
age: 25
From this I need to get a spark dataframe using spark with scala
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+
I have tried converting to json and then to dataframe, but I am not able to specify it in a dataset sequence.
回答1:
There is a solution, that will help you convert your yaml to json and then read it as a DataFrame
You need to add this 2 dependencies:
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
class ScalaYamltoDataFrame {
val yamlExample = "- firstName: \"James\"\n lastName: \"Bond\"\n age: 30\n\n- firstName: \"Super\"\n lastName: \"Man\"\n age: 25"
def convertYamlToJson(yaml: String): String = {
val yamlReader = new ObjectMapper(new YAMLFactory)
val obj = yamlReader.readValue(yaml, classOf[Any])
val jsonWriter = new ObjectMapper
jsonWriter.writeValueAsString(obj)
}
println(convertYamlToJson(yamlExample))
def yamlToDF(): Unit = {
@transient
lazy val sparkSession = SparkSession.builder
.master("local")
.appName("Convert Yaml to Dataframe")
.getOrCreate()
import sparkSession.implicits._
val ds = sparkSession.read
.option("multiline", true)
.json(Seq(convertYamlToJson(yamlExample)).toDS)
ds.show(false)
ds.printSchema()
}
//println(convertYamlToJson(yamlExample))
[{"firstName":"James","lastName":"Bond","age":30},{"firstName":"Super","lastName":"Man","age":25}]
//ds.show(false)
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+
//ds.printSchma()
root
|-- age: long (nullable = true)
|-- firstName: string (nullable = true)
|-- lastName: string (nullable = true)
Hope this helps !
来源:https://stackoverflow.com/questions/58806113/how-to-parse-a-yaml-with-spark-scala