Building a row from a dict in pySpark

前端 未结 2 915
小蘑菇
小蘑菇 2021-02-01 03:17

I\'m trying to dynamically build a row in pySpark 1.6.1, then build it into a dataframe. The general idea is to extend the results of describe to include, for exam

2条回答
  •  野性不改
    2021-02-01 04:09

    You can use keyword arguments unpacking as follows:

    Row(**row_dict)
    
    ## Row(C0=-1.1990072635132698, C3=0.12605772684660232, C4=0.5760856026559944, 
    ##     C5=0.1951877800894315, C6=24.72378589441825, summary='kurtosis')
    

    It is important to note that it internally sorts data by key to address problems with older Python versions.

    This behavior is likely to be removed in the upcoming releases - see SPARK-29748 Remove sorting of fields in PySpark SQL Row creation. Once it is remove you'll have to ensure that the order of values in the dict is consistent across records.

提交回复
热议问题