Using a custom object in pandas.read_csv()

前端 未结 3 1617
清酒与你
清酒与你 2020-12-15 14:31

I am interested in streaming a custom object into a pandas dataframe. According to the documentation, any object with a read() method can be used. However, even after implem

3条回答
  •  春和景丽
    2020-12-15 15:03

    The documentation mentions the read method but it's actually checking if it's a is_file_like argument (that's where the exception is thrown). That function is actually very simple:

    def is_file_like(obj):
        if not (hasattr(obj, 'read') or hasattr(obj, 'write')):
            return False
        if not hasattr(obj, "__iter__"):
            return False
        return True
    

    So it also needs an __iter__ method.

    But that's not the only problem. Pandas requires that it actually behaves file-like. So the read method should accept an additional argument for the number of bytes (so you can't make read a generator - because it has to be callable with 2 arguments and should return a string).

    So for example:

    class DataFile(object):
        def __init__(self, files):
            self.data = """a b
    1 2
    2 3
    """
            self.pos = 0
    
        def read(self, x):
            nxt = self.pos + x
            ret = self.data[self.pos:nxt]
            self.pos = nxt
            return ret
    
        def __iter__(self):
            yield from self.data.split('\n')
    

    will be recognized as valid input.

    However it's harder for multiple files, I hoped that fileinput could have some appropriate routines but it doesn't seem like it:

    import fileinput
    
    pd.read_csv(fileinput.input([...]))
    # ValueError: Invalid file path or buffer object type: 
    

提交回复
热议问题