Access to WrappedArray elements

百般思念 提交于 2021-02-19 02:37:39

问题


I have a spark dataframe and here is the schema:

|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
|    |-- element: array (containsNull = true)
|    |    |-- element: long (containsNull = true)

and a sample of rows:.

df.select(df['father']).show()
+--------------------+
|              father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+

and the type is

DataFrame[father: array<array<bigint>>]

How can I have access to each element of inner array? For example -17 in the first row? I tried different things like df.select(df['father'])(0)(0).show() but no luck.


回答1:


If I'm not mistaken, the syntax for in Python is

df.select(df['father'])[0][0].show()

or

df.select(df['father']).getItem(0).getItem(0).show()

See some examples here: http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=column#pyspark.sql.Column




回答2:


The solution in scala should be as

import org.apache.spark.sql.functions._
val data =  sparkContext.parallelize("""{"eid":1,"age":30,"sex":1,"father":[[1,2]]}""" :: Nil)
val dataframe = sqlContext.read.json(data).toDF()

the dataframe looks as

+---+---+---+--------------------+
|eid|age|sex|father              |
+---+---+---+--------------------+
|1  |30 |1  |[WrappedArray(1, 2)]|
+---+---+---+--------------------+

the solution should be

dataframe.select(col("father")(0)(0) as("first"), col("father")(0)(1) as("second")).show(false)

output should be

+-----+------+
|first|second|
+-----+------+
|1    |2     |
+-----+------+



回答3:


Another scala answer would look like this:

df.select(col("father").getItem(0) as "father_0", col("father").getItem(1) as "father_1")


来源:https://stackoverflow.com/questions/44468311/access-to-wrappedarray-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!