How can I convert a Pyspark dataframe to a CSV without sending it to a file?

試著忘記壹切 提交于 2020-06-29 05:04:43

问题


I have a dataframe which I need to convert to a CSV file, and then I need to send this CSV to an API. As I'm sending it to an API, I do not want to save it to the local filesystem and need to keep it in memory. How can I do this?


回答1:


Easy way: convert your dataframe to Pandas dataframe with toPandas(), then save to a string. To save to a string, not a file, you'll have to call to_csv with path_or_buf=None. Then send the string in an API call.

From to_csv() documentation:

Parameters

path_or_bufstr or file handle, default None

File path or object, if None is provided the result is returned as a string.

So your code would likely look like this:

csv_string = df.toPandas().to_csv(path_or_bufstr=None)

Alternatives: use tempfile.SpooledTemporaryFile with a large buffer to create an in-memory file. Or you can even use a regular file, just make your buffer large enough and don't flush or close the file. Take a look at Corey Goldberg's explanation of why this works.



来源:https://stackoverflow.com/questions/61645936/how-can-i-convert-a-pyspark-dataframe-to-a-csv-without-sending-it-to-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!