I have CSV file that looks like the following,
1994, Category1, Something Happened 1
1994, Category2, Something Happened 2
1995, Category1, Something Happen
A very concise way to do this is to use pandas, the benefits are: it has a faster CSV pharser; and it works in columns (so it only requires one df.apply(set) to get you there) :
In [244]:
#Suppose the CSV is named temp.csv
df=pd.read_csv('temp.csv',header=None)
df.apply(set)
Out[244]:
0 set([1994, 1995, 1996, 1998])
1 set([ Category2, Category3, Category1])
2 set([ Something Happened 4, Something Happene...
dtype: object
The downside is that it returns a pandas.Series, and to get access each list, you need to do something like list(df.apply(set)[0]).
If the order has to be preserved, it can be also done very easily, for example:
for i, item in df.iteritems():
print item.unique()
item.unique() will return numpy.arrays, instead of lists.