I have the following lisp file, which is from the UCI machine learning database. I would like to convert it into a flat text file using python. A typical line looks like th
Separate it into pairs with a regular expression:
In [1]: import re
In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'
In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]
Then make it into a dictionary:
dct = {}
for p in data:
if not p[0] in dct.keys():
dct[p[0]] = [p[1]]
else:
dct[p[0]].append(p[1])
The result:
In [10]: dct
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}
Printing:
print 'time pitch duration keysig timesig fermata'
for t in range(len(dct['st'])):
print dct['st'][t], dct['pitch'][t], dct['dur'][t],
print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]
Proper formatting is left as an exercise for the reader...