Using a custom object in pandas.read_csv()

前端未结

关注

 3  1617

清酒与你 2020-12-15 14:31

I am interested in streaming a custom object into a pandas dataframe. According to the documentation, any object with a read() method can be used. However, even after implem

3条回答

春和景丽 (楼主)

2020-12-15 15:03
The documentation mentions the read method but it's actually checking if it's a is_file_like argument (that's where the exception is thrown). That function is actually very simple:
```
def is_file_like(obj):
    if not (hasattr(obj, 'read') or hasattr(obj, 'write')):
        return False
    if not hasattr(obj, "__iter__"):
        return False
    return True
```
So it also needs an __iter__ method.

But that's not the only problem. Pandas requires that it actually behaves file-like. So the read method should accept an additional argument for the number of bytes (so you can't make read a generator - because it has to be callable with 2 arguments and should return a string).

So for example:
```
class DataFile(object):
    def __init__(self, files):
        self.data = """a b
1 2
2 3
"""
        self.pos = 0

    def read(self, x):
        nxt = self.pos + x
        ret = self.data[self.pos:nxt]
        self.pos = nxt
        return ret

    def __iter__(self):
        yield from self.data.split('\n')
```
will be recognized as valid input.

However it's harder for multiple files, I hoped that fileinput could have some appropriate routines but it doesn't seem like it:
```
import fileinput

pd.read_csv(fileinput.input([...]))
# ValueError: Invalid file path or buffer object type: 
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...