Extract specific lines from file and create sections of data in python

后端 未结 3 2111
再見小時候
再見小時候 2021-01-21 15:31

Trying to write a python script to extract lines from a file. The file is a text file which is a dump of python suds output.

I want to:

  1. strip all charac
3条回答
  •  离开以前
    2021-01-21 15:44

    Several suggestions on your code:

    Stripping all non-alphanumeric characters is totally unnecessary and timewasting; there is no need whatsoever to build linelist. Are you aware you can simply use plain old string.find("ArrayOf_xsd_string") or re.search(...)?

    1. strip all characters except words and numbers. I don't want any "\n", "[", "]", "{", "=", etc characters.
    2. find a section where it starts with "ArrayOf_xsd_string"
    3. remove the next line "item[] =" from the result

    Then as to your regex, _ is already covered under \W anyway. But the following reassignment to line overwrites the line you just read??

    for line in f:
      line = re.compile('[\W_]+') # overwrites the line you just read??
      line.sub('', string.printable)
    

    Here's my version, which reads the file directly, and also handles multiple matches:

    with open('data.txt', 'r') as f:
        theDict = {}
        found = -1
        for (lineno,line) in enumerate(f):
            if found < 0:
                if line.find('ArrayOf_xsd_string')>=0:
                    found = lineno
                    entries = []
                continue
            # Grab following 6 lines...
            if 2 <= (lineno-found) <= 6+1:
                entry = line.strip(' ""{}[]=:,')
                entries.append(entry)
            #then create a dict with the key from line 5
            if (lineno-found) == 6+1:
                key = entries.pop(4)
                theDict[key] = entries
                print key, ','.join(entries) # comma-separated, no quotes
                #break # if you want to end on first match
                found = -1 # to process multiple matches
    

    And the output is exactly what you wanted (that's what ','.join(entries) was for):

    123456 001,ABCD,1234,wordy type stuff,more stuff, etc
    234567 002,ABCD,1234,wordy type stuff,more stuff, etc
    345678 003,ABCD,1234,wordy type stuff,more stuff, etc
    

提交回复
热议问题