Why does createDataFrame reorder the columns?

后端 未结 1 1140

Suppose I am creating a data frame from a list without a schema:

data = [Row(c=0, b=1, a=2), Row(c=10, b=11, a=12)]
df         


        
相关标签:
1条回答
  • 2020-12-21 06:06

    Why are the columns reordered in alphabet order ?

    Because Row created with **kwargs sorts the arguments by name.

    This design choice is required to address the issues described in PEP 468. Please check SPARK-12467 for a discussion.

    Can I preserve the original order of columns without adding a schema ?

    Not with **kwargs. You can use plain tuples:

    df = spark.createDataFrame([(0, 1, 2), (10, 11, 12)], ["c", "b", "a"])
    

    or namedtuple:

    from collections import namedtuple
    
    CBA = namedtuple("CBA", ["c", "b", "a"])
    spark.createDataFrame([CBA(0, 1, 2), CBA(10, 11, 12)])
    
    0 讨论(0)
提交回复
热议问题