Parsing data from text file

前端 未结 3 670
梦毁少年i
梦毁少年i 2020-12-31 18:30

I have a text file that has content like this:

******** ENTRY 01 ********
ID:                  01
Data1:               0.1834869385E-002
Data2:                       


        
相关标签:
3条回答
  • 2020-12-31 19:02

    It is very far from CSV, actually.

    You can use the file as an iterator; the following generator function yields complete sections:

    def load_sections(filename):
        with open(filename, 'r') as infile:
            line = ''
            while True:
                while not line.startswith('****'): 
                    line = next(infile)  # raises StopIteration, ending the generator
                    continue  # find next entry
    
                entry = {}
                for line in infile:
                    line = line.strip()
                    if not line: break
    
                    key, value = map(str.strip, line.split(':', 1))
                    entry[key] = value
    
                yield entry
    

    This treats the file as an iterator, meaning that any looping advances the file to the next line. The outer loop only serves to move from section to section; the inner while and for loops do all the real work; first skip lines until a **** header section is found (otherwise discarded), then loop over all non-empty lines to create a section.

    Use the function in a loop:

    for section in load_sections(filename):
        print section
    

    Repeating your sample data in a text file results in:

    >>> for section in load_sections('/tmp/test.txt'):
    ...     print section
    ... 
    {'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
    {'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
    {'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
    

    You can add some data converters to that if you want to; a mapping of key to callable would do:

    converters = {'ID': int, 'Data1': float, 'Data2': float, 'Data3': float, 'Data4': int}
    

    then in the generator function, instead of entry[key] = value do entry[key] = converters.get(key, lambda v: v)(value).

    0 讨论(0)
  • 2020-12-31 19:14

    my_file:

    ******** ENTRY 01 ********
    ID:                  01
    Data1:               0.1834869385E-002
    Data2:              10.9598489301
    Data3:              -0.1091356549E+001
    Data4:                715
    
    ID:                  02
    Data1:               0.18348674325E-012
    Data2:              10.9598489301
    Data3:              0.0
    Data4:                5748
    
    ID:                  03
    Data1:               20.1834869385E-002
    Data2:              10.954576354
    Data3:              10.13476858762435E+001
    Data4:                7456
    

    Python script:

    import re
    
    with open('my_file', 'r') as f:
        data  = list()
        group = dict()
        for key, value in re.findall(r'(.*):\s*([\dE+-.]+)', f.read()):
            if key in group:
                data.append(group)
                group = dict()
            group[key] = value
        data.append(group)
    
    print data
    

    Printed output:

    [
        {
            'Data4': '715',
            'Data1': '0.1834869385E-002',
            'ID': '01',
            'Data3': '-0.1091356549E+001',
            'Data2': '10.9598489301'
        },
        {
            'Data4': '5748',
            'Data1': '0.18348674325E-012',
            'ID': '02',
            'Data3': '0.0',
            'Data2': '10.9598489301'
        },
        {
            'Data4': '7456',
            'Data1': '20.1834869385E-002',
            'ID': '03',
            'Data3': '10.13476858762435E+001',
            'Data2': '10.954576354'
        }
    ]
    
    0 讨论(0)
  • 2020-12-31 19:22

    A very simple approach could be

    all_objects = []
    
    with open("datafile") as f:
        for L in f:
            if L[:3] == "***":
                # Line starts with asterisks, create a new object
                all_objects.append({})
            elif ":" in L:
                # Line is a key/value field, update current object
                k, v = map(str.strip, L.split(":", 1))
                all_objects[-1][k] = v
    
    0 讨论(0)
提交回复
热议问题