I have a generator producing a list of strings. Is there a utility/adapter in Python that could make it look like a file?
For example,
>>> d
The "correct" way to do this is inherit from a standard Python io abstract base class. However it doesn't appear that Python allows you to provide a raw text class, and wrap this with a buffered reader of any kind.
The best class to inherit from is TextIOBase. Here's such an implementation, handling readline
, and read
while being mindful of performance. (gist)
import io
class StringIteratorIO(io.TextIOBase):
def __init__(self, iter):
self._iter = iter
self._left = ''
def readable(self):
return True
def _read1(self, n=None):
while not self._left:
try:
self._left = next(self._iter)
except StopIteration:
break
ret = self._left[:n]
self._left = self._left[len(ret):]
return ret
def read(self, n=None):
l = []
if n is None or n < 0:
while True:
m = self._read1()
if not m:
break
l.append(m)
else:
while n > 0:
m = self._read1(n)
if not m:
break
n -= len(m)
l.append(m)
return ''.join(l)
def readline(self):
l = []
while True:
i = self._left.find('\n')
if i == -1:
l.append(self._left)
try:
self._left = next(self._iter)
except StopIteration:
self._left = ''
break
else:
l.append(self._left[:i+1])
self._left = self._left[i+1:]
break
return ''.join(l)
Looking at Matt's answer, I can see that it's not always necessary to implement all the read methods. read1 may be sufficient, which is described as:
Read and return up to size bytes, with at most one call to the underlying raw stream’s read()...
Then it can be wrapped with io.TextIOWrapper which, for instance, has implementation of readline
. As an example here's streaming of CSV-file from S3's (Amazon Simple Storage Service) boto.s3.key.Key
which implements iterator for reading.
import io
import csv
from boto import s3
class StringIteratorIO(io.TextIOBase):
def __init__(self, iter):
self._iterator = iter
self._buffer = ''
def readable(self):
return True
def read1(self, n=None):
while not self._buffer:
try:
self._buffer = next(self._iterator)
except StopIteration:
break
result = self._buffer[:n]
self._buffer = self._buffer[len(result):]
return result
conn = s3.connect_to_region('some_aws_region')
bucket = conn.get_bucket('some_bucket')
key = bucket.get_key('some.csv')
fp = io.TextIOWrapper(StringIteratorIO(key))
reader = csv.DictReader(fp, delimiter = ';')
for row in reader:
print(row)
Here's an answer to related question which looks a little better. It inherits io.RawIOBase
and overrides readinto
. In Python 3 it's sufficient, so instead of wrapping IterStream
in io.BufferedReader
one can wrap it in io.TextIOWrapper
. In Python 2 read1
is needed but it can be simply expressed though readinto
.