How do I subclass or otherwise extend a pandas DataFrame without breaking DataFrame.append()?

回眸只為那壹抹淺笑 提交于 2021-02-10 14:14:37

问题


I have a complex object I'd like to build around a pandas DataFrame. I've tried to do this with a subclass, but appending to the DataFrame reinitializes all properties in a new instance even when using _metadata, as recommended here. I know subclassing pandas objects is not recommended but I don't know how to do what I want with composition (or any other method), so if someone can tell me how to do this without subclassing that would be great.

I'm working with the following code:

import pandas as pd

class thisDF(pd.DataFrame):

    @property
    def _constructor(self):
        return thisDF

    _metadata = ['new_property']

    def __init__(self, data=None, index=None, columns=None, copy=False, new_property='reset'):
        
        super(thisDF, self).__init__(data=data, index=index, columns=columns, dtype='str', copy=copy)

        self.new_property = new_property

cols = ['A', 'B', 'C']
new_property = cols[:2]
tdf = thisDF(columns=cols, new_property=new_property)

As in the examples I linked to above, operations like tdf[['A', 'B']].new_property work fine. However, modifying the data in a way that creates a new copy initializes a new instance that doesn't retain new_property. So the code

print(tdf.new_property)
tdf = tdf.append(pd.Series(['a', 'b', 'c'], index=tdf.columns), ignore_index=True)
print(tdf.new_property)

outputs

['A', 'B']
reset

How do I extend pd.DataFrame so that thisDF.append() retains instance attributes (or some equivalent data structure if not using a subclass)? Note that I can do everything I want by making a class with a DataFrame as an attribute, but I don't want to do my_object.dataframe.some_method() for all DataFrame operations.


回答1:


"[...] or wrapping all DataFrame methods with my_object class methods (because I'm assuming that would be a lot of work, correct?)"

No it doesn't have to be a lot of work. You actually don't have to wrap every function of the wrapped object yourself. You can use getattr to pass calls down to your wrapped object like this:

class WrappedDataFrame:
    def __init__(self, df, new_property):
        self._df = df
        self.new_property = new_property
    
    def __getattr__(self, attr):
        if attr in self.__dict__:
            return getattr(self, attr)
        return getattr(self._df, attr)
    
    def __getitem__(self, item):
        return self._df[item]
    
    def __setitem__(self, item, data):
        self._df[item] = data

__getattr__ is a dunder method that is called every time you call a method of an instance of that class. In my implementation, every time __getattr__ is implicitly called, it checks if the object has the method you are calling. If it does, that method is returned and executed. Otherwise, it will look for that method in the __dict__of the wrapped object and return that method.

So this class works almost exactly like a DataFrame for the most part. You could now just implement the methods you want to behave differently like append in your example.

You could either make it so that append modifies the wrapped DataFrame object

    def append(self, *args, **kwargs):
        self._df = self._df.append(*args, **kwargs)

or so that it returns a new instance of the WrappedDataFrame class, which of course keeps all your functionality.

    def append(self, *args, **kwargs):
        return self.__class__(self._df.append(*args, **kwargs))


来源:https://stackoverflow.com/questions/65375177/how-do-i-subclass-or-otherwise-extend-a-pandas-dataframe-without-breaking-datafr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!