Is there a well-hidden way to read tokens from a file or file-like object without reading entire lines? The application I immediately have (someone else\'s problem
Here is a generator that processes a file one character at a time and yields tokens when whitespace is encountered.
def generate_tokens(path):
with open(path, 'r') as fp:
buf = []
while True:
ch = fp.read(1)
if ch == '':
break
elif ch.isspace():
if buf:
yield ''.join(buf)
buf = []
else:
buf.append(ch)
if __name__ == '__main__':
for token in generate_tokens('input.txt'):
print token
To be more generic, it looks like you might be able to use the re
module as described at this link. Just feed the input with a generator from your file to avoid reading the whole file at once.
Python equivalent of ruby's StringScanner?