Extracting values from a Spark column containing nested values [duplicate]

怎甘沉沦 提交于 2019-12-11 01:57:27

问题


This is part of the schema of my mongodb collection:

|-- variables: struct (nullable = true)  
|    |-- actives: struct (nullable = true)  
|    |    |-- data: struct (nullable = true)  
|    |    |    |-- 0: struct (nullable = true)  
|    |    |    |    |--active: integer (nullable = true)  
|    |    |    |    |-- inactive: integer (nullable = true)

I've fetched the collection and stored it in a Spark dataframe and am now trying to extract the innermost values in the variables column.

df_temp = df1.select(df1.variables.actives.data)

This works perfectly fine and I am able to get the inner structure of the data struct.

+----------------------+  
|variables.actives.data|  
+----------------------+  
|  [[1,32,0.516165...|  
|  [[1,30,1.173139...|  
|  [[4,18,0.160088...|

However, as soon as I try to go in further:

df_temp = df1.select(df1.variables.actives.data.0.active)

I get an invalid syntax error.

df_temp = df1.select(df1.variables.actives.data.0.active)
^
SyntaxError: invalid syntax

The problem is with my inner field's key's name being a number and I couldn't find an example where the inner field key's name is a number.

What would be the best way to achieve my goal of retrieving the innermost values (active and inactive) from the dataframe?


回答1:


You can try:

df_temp = df1.select(df1.variables.actives.data["0"].active)


来源:https://stackoverflow.com/questions/48062171/extracting-values-from-a-spark-column-containing-nested-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!