Ambiguous behavior while adding new column to StructType
问题 I defined a function in PySpark which is- def add_ids(X): schema_new = X.schema.add("id_col", LongType(), False) _X = X.rdd.zipWithIndex().map(lambda l: list(l[0]) + [l[1]]).toDF(schema_new) cols_arranged = [_X.columns[-1]] + _X.columns[0:len(_X.columns) - 1] return _X.select(*cols_arranged) In the function above, I'm creating a new column(with the name of id_col ) that gets appended to the dataframe which is basically just the index number of each row and it finally moves the id_col to the