Create a pandas DataFrame from generator?

前端未结

关注

 5  891

I\'ve create a tuple generator that extract information from a file filtering only the records of interest and converting it to a tuple that generator returns.

I\'ve

相关标签:

5条回答

暖寄归人

2020-12-08 07:22

If generator is just like a list of DataFrames, you need just to create a new DataFrame concatenating elements of the list:

result = pd.concat(list)

Recently I've faced the same problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-12-08 07:26
You can also use something like (Python tested in 2.7.5)
```
from itertools import izip

def dataframe_from_row_iterator(row_iterator, colnames):
    col_iterator = izip(*row_iterator)
    return pd.DataFrame({cn: cv for (cn, cv) in izip(colnames, col_iterator)})
```
You can also adapt this to append rows to a DataFrame.

-- Edit, Dec 4th: s/row/rows in last line
0 讨论(0)
发布评论:

提交评论
- 加载中...
不要未来只要你来

2020-12-08 07:34
You cannot create a DataFrame from a generator with the 0.12 version of pandas. You can either update yourself to the development version (get it from the github and compile it - which is a little bit painful on windows but I would prefer this option).

Or you can, since you said you are filtering the lines, first filter them, write them to a file and then load them using read_csv or something else...

If you want to get super complicated you can create a file like object that will return the lines:
```
def gen():
    lines = [
        'col1,col2\n',
        'foo,bar\n',
        'foo,baz\n',
        'bar,baz\n'
    ]
    for line in lines:
        yield line

class Reader(object):
    def __init__(self, g):
        self.g = g
    def read(self, n=0):
        try:
            return next(self.g)
        except StopIteration:
            return ''
```
And then use the read_csv:
```
>>> pd.read_csv(Reader(gen()))
  col1 col2
0  foo  bar
1  foo  baz
2  bar  baz
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-12-08 07:36
To get it to be memory efficient, read in chunks. Something like this, using Viktor's Reader class from above.
```
df = pd.concat(list(pd.read_csv(Reader(gen()),chunksize=10000)),axis=1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

没有蜡笔的小新

2020-12-08 07:38

You certainly can construct a pandas.DataFrame() from a generator of tuples, as of version 19 (and probably earlier). Don't use .from_records(); just use the constructor, for example:

import pandas as pd
someGenerator = ( (x, chr(x)) for x in range(48,127) )
someDf = pd.DataFrame(someGenerator)

Produces:

type(someDf) #pandas.core.frame.DataFrame

someDf.dtypes
#0     int64
#1    object
#dtype: object

someDf.tail(10)
#      0  1
#69  117  u
#70  118  v
#71  119  w
#72  120  x
#73  121  y
#74  122  z
#75  123  {
#76  124  |
#77  125  }
#78  126  ~

0 讨论(0)