Create single row dataframe from list of list PySpark

后端 未结 3 894
傲寒
傲寒 2020-11-27 07:46

I have a data like this data = [[1.1, 1.2], [1.3, 1.4], [1.5, 1.6]] I want to create a PySpark dataframe

I already use

dataframe = SQLCo         


        
3条回答
  •  温柔的废话
    2020-11-27 08:09

    I find it's useful to think of the argument to createDataFrame() as a list of tuples where each entry in the list corresponds to a row in the DataFrame and each element of the tuple corresponds to a column.

    You can get your desired output by making each element in the list a tuple:

    data = [([1.1, 1.2],), ([1.3, 1.4],), ([1.5, 1.6],)]
    dataframe = sqlCtx.createDataFrame(data, ['features'])
    dataframe.show()
    #+----------+
    #|  features|
    #+----------+
    #|[1.1, 1.2]|
    #|[1.3, 1.4]|
    #|[1.5, 1.6]|
    #+----------+
    

    Or if changing the source is cumbersome, you can equivalently do:

    data = [[1.1, 1.2], [1.3, 1.4], [1.5, 1.6]]
    dataframe = sqlCtx.createDataFrame(map(lambda x: (x, ), data), ['features'])
    dataframe.show()
    #+----------+
    #|  features|
    #+----------+
    #|[1.1, 1.2]|
    #|[1.3, 1.4]|
    #|[1.5, 1.6]|
    #+----------+
    

提交回复
热议问题