Suppose I am creating a data frame from a list without a schema:
data = [Row(c=0, b=1, a=2), Row(c=10, b=11, a=12)]
df
Why are the columns reordered in alphabet order ?
Because Row created with **kwargs sorts the arguments by name.
This design choice is required to address the issues described in PEP 468. Please check SPARK-12467 for a discussion.
Can I preserve the original order of columns without adding a schema ?
Not with **kwargs. You can use plain tuples:
df = spark.createDataFrame([(0, 1, 2), (10, 11, 12)], ["c", "b", "a"])
or namedtuple:
from collections import namedtuple
CBA = namedtuple("CBA", ["c", "b", "a"])
spark.createDataFrame([CBA(0, 1, 2), CBA(10, 11, 12)])