问题
I am using this method, kindly suggested by Ashwini Chaudhary, to assign data to a dictionary from a text file that is in a specific format.
keys = map(str.strip, next(f).split('Key\t')[1].split('\t'))
words = map(str.strip, next(f).split('Word\t')[1].split('\t'))
The text file has the row title followed by values, separated by a \t
character.
Example 1:
Key a 1 b 2 c 3 d 4
Word as box cow dig
How would I change my code not to read all the lines in a file, but only specific ones? Extra Lines which I do not want to read should just be ignored:
Example 2 - ignore LineHere
and OrHere
rows:
LineHere w x y z
Key a 1 b 2 c 3 d 4
OrHere 00 01 10 11
Word as box cow dig
Or if I wanted to have the possibility of reading a line titled 'Word' XOR 'Letter', whichever one happens to be in the file. So the code to scan Examples 1 or 2 would also be valid for:
Example 3 - I want to read Key
and Letter
lines:
LineHere w x y z
Key a 1 b 2 c 3 d 4
OrHere 00 01 10 11
Letter A B C D
Please feel free to comment with question criticisms and I'll be happy to rephrase/clarify the question.
As a reference, the precursor question is linked here
Many thanks,
Alex
回答1:
Something like this:
import re
with open('abc') as f:
for line in f:
if line.startswith('Key'):
keys = re.search(r'Key\s+(.*)',line).group(1).split("\t")
elif line.startswith(('Word','Letter')):
vals = re.search(r'(Word|Letter)\s+(.*)',line).group(2).split("\t")
print dict(zip(keys,vals))
abc:
LineHere w x y z
Key a 1 b 2 c 3 d 4
OrHere 00 01 10 11
Word as box cow dig
output is :
{'d 4': 'dig', 'b 2': 'box', 'a 1': 'as', 'c 3': 'cow'}
abc:
LineHere w x y z
Key a 1 b 2 c 3 d 4
OrHere 00 01 10 11
Letter A B C D
output is :
{'d 4': 'D', 'b 2': 'B', 'a 1': 'A', 'c 3': 'C'}
回答2:
ss = '''LineHere w x y z
Key a 1 b 2 c 3 d 4
OrHere 00 01 10 11
Word as box cow dig
'''
import re
rgx = re.compile('Key +(.*)\r?\n'
'(?:.*\r?\n)?'
'(?:Word|Letter) +(.*)\r?\n')
mat = rgx.search(ss)
keys = mat.group(1).split(' ')
words = mat.group(2).split('\t')
You'll obtain ss by reading your file:
with open (filename) as f:
ss = f.read()
Edit
Well, if all the lines have data separated with tabs, you can do:
ss = '''LineHere w\tx\ty\tz
Key a 1\tb 2\tc 3\td 4
OrHere 00\t01\t10\t11
Word as\tbox\tcow\tdig
'''
import re
rgx = re.compile('Key +(.*)\r?\n'
'(?:.*\r?\n)?'
'(?:Word|Letter) +(.*)\r?\n')
print dict(zip(*map(lambda x: x.split('\t'),
rgx.search(ss).groups())))
来源:https://stackoverflow.com/questions/17489228/reading-data-from-specially-formatted-text-file