Saving Matplotlib Output to DBFS on Databricks

你。 提交于 2020-03-01 04:41:16

问题


I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS.

Code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
# plt.show()

Things that I tried:

plt.savefig("/FileStore/my-file.png")

[Errno 2] No such file or directory: '/FileStore/my-file.png'

fig = plt.gcf()
dbutils.fs.put("/dbfs/FileStore/my-file.png", fig)

TypeError: has the wrong type - (,) is expected.

After some research, I think the fs.put only works if you want to save text files.

running the above code with plt.show() will get you a bar graph - I want to be able to save the bar graph as an image to DBFS. Any help is appreciated, thanks in advance!


回答1:


You can do this by saving the figure to memory and then using the Python local file APIs to write to the DataBricks filesystem (DBFS).

Example:

import matplotlib.pyplot as plt
from io import BytesIO

# Create a plt or fig, then:
buf = BytesIO()
plt.savefig(buf, format='png')

path = '/dbfs/databricks/path/to/file.png'

# Make sure to open the file in bytes mode
with open(path, 'wb') as f:
  # You can also use Bytes.IO.seek(0) then BytesIO.read()
  f.write(buf.getvalue())


来源:https://stackoverflow.com/questions/57203817/saving-matplotlib-output-to-dbfs-on-databricks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!