Spark Dataframe is saved to MongoDB in wrong format

China☆狼群 提交于 2019-12-25 09:13:14

问题


I am using Spark-MongoDB and I am trying to save a DataFrame into MongoDB :

val event = """{"Dev":[{"a":3},{"b":3}],"hr":[{"a":6}]}"""
val events = sc.parallelize(event :: Nil)
val df = sqlc.read.json(events)
val saveConfig = MongodbConfigBuilder(Map(Host -> List("localhost:27017"),
 Database -> "test", Collection -> "test", SamplingRatio -> 1.0, WriteConcern -> "normal",
 SplitSize -> 8, SplitKey -> "_id"))
df.saveToMongodb(saveConfig.build)

I'm expecting the data to be saved as the input string, but what is actually saved is:

{ "_id" : ObjectId("57cedf4bd244c56e8e783a45"), "Dev" : [ { "a" : NumberLong(3), "b" : null }, { "a" : null, "b" : NumberLong(3) } ], "hr" : [ { "a" : NumberLong(6) } ] }

I want to avoid those null values and duplicates, Any idea?


回答1:


Have you tried event defined as below using backslash:

val event = "{\"Dev\":[{\"a\":3},{\"b\":3}],\"hr\":[{\"a\":6}]}"


来源:https://stackoverflow.com/questions/39389700/spark-dataframe-is-saved-to-mongodb-in-wrong-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!