Pickling Pandas DataFrames subclasses which include metadata

不羁的心 提交于 2019-12-14 03:58:39

问题


The question about attaching metadata to Pandas objects, and getting that data to survive a pickle/unpickle process is a perennial one. I see some very old answers, which basically say that you can't. Hopefully, a more current answer to this question will be yes. I'm using Pandas 0.23.3.

I've made some Pandas DataFrame subclasses. I think I know how to do this correctly. I have a _constructor method, and my __init__ method can handle BlockManager objects. When I create meta-data attributes, I suppress the UserWarning which cautions that I'm not creating a column in the DataFrame itself, which in my case is fine.

When I want to save the DataFrame to disk, I call my_fancy_df.to_pickle(file_path). When I want to reload it, I use my_fancy_df = pandas.read_pickle(file_path). MY meta-data gets removed. Pandas itself has meta-data which pickles and unpickles fine, such as the DataFrame.name attribute. I would like to copy this behavior for my attributes.

I could intercept the .to_pickle call in my subclass, and arrange to write the meta-data separately into the same file object. But I don't see an equivalent approach for changing the way that data is reloaded. The read_pickle function is general-purpose, and lives in the Pandas namespace, it doesn't belong to the DataFrame class.

I could possibly write a custom unpickling function, external to my class and use that... it seems clumsy. If there's an elegant way to get this job done, I haven't found it.

I'm also not dead-set on using pickle. If HDF5 is more suitable, for example, I can switch. I do need to pickle arbitrary Python data types in the DataFrame, though. The content in the cells is not just strings and numbers, I have tuples as well, and in one subclass I've built I even placed DataFrames inside DataFrames.

Thanks for your advice.


回答1:


The comment from user "root" was helpful. I have confirmed that if you define a class property called _metadata inside your custom DataFrame subclass, it is the list of the instance properties you want to retain through slicing, pickling, and unpickling operations.



来源:https://stackoverflow.com/questions/57237906/pickling-pandas-dataframes-subclasses-which-include-metadata

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!