I am using itertools.groupby
to parse a short tab-delimited textfile. the text file has several columns and all I want to do is group all the entries that have a pa
You're going to want to change your code to force the data to be in key order...
data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
sorted_data = sorted(data, key=operator.itemgetter(col_name))
for name, entries in itertools.groupby(data, key=operator.itemgetter(col_name)):
pass # whatever
The main use though, is when the datasets are large, and the data is already in key order, so when you have to sort anyway, then using a defaultdict
is more efficient
from collections import defaultdict
name_entries = defaultdict(list)
for row in data:
name_entries[row[col_name]].append(row)