subclasses of pandas' object work differently from subclass of other object?

感情迁移 提交于 2019-11-30 14:48:47
Dale Jung

Current Answer (Pandas >= 0.13)

An internal refactor in Pandas 0.13 drastically simplified subclassing. Pandas Series can now be subclassed like any other Python object:

class MySeries(pd.Series):
    def my_method(self):
        return "my_method"

Legacy Answer (Pandas <= 0.12)

The problem is that Series uses __new__ which is ensuring that a Series object is instantiated.

You can modify your class like so:

class Support(pd.Series):
    def __new__(cls, *args, **kwargs):
        arr = Series.__new__(cls, *args, **kwargs)
        return arr.view(Support)

    def supportMethod1(self):
        print 'I am support method 1'       
    def supportMethod2(self):
        print 'I am support method 2'

However, it's probably best to do a has-a instead of a is-a. Or monkey patch the Series object. The reason is that you will often lose your subclass while using pandas due to the nature of it's data storage. Something as simple as

s.ix[:5] 
s.cumsum()

Will return a Series object instead of your subclass. Internally, the data is stored in contiguous arrays and optimized for speed. The data is only boxed with a class when needed and those classes are hardcoded. Plus, it's not immediately obvious if something like s.ix[:5] should return the same subclass. That would depend on the semantics of your subclass and what metadata is attached to it.

http://nbviewer.ipython.org/3366583/subclassing%20pandas%20objects.ipynb has some notes.

Support() returns a Series object.

On subclassing of Series and DataFrame see also: https://github.com/pydata/pandas/issues/60

In [16]: class MyDict(dict):
   ....:     pass
   ....:

In [17]: md = MyDict()

In [18]: type(md)
Out[18]: __main__.MyDict

In [21]: class MySeries(Series):
   ....:     pass
   ....:

In [22]: ms = MySeries()

In [23]: type(ms)
Out[23]: pandas.core.series.Series
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!