PySpark How to read CSV into Dataframe, and manipulate it
问题 I'm quite new to pyspark and am trying to use it to process a large dataset which is saved as a csv file. I'd like to read CSV file into spark dataframe, drop some columns, and add new columns. How should I do that? I am having trouble getting this data into a dataframe. This is a stripped down version of what I have so far: def make_dataframe(data_portion, schema, sql): fields = data_portion.split(",") return sql.createDateFrame([(fields[0], fields[1])], schema=schema) if __name__ == "__main