Parsing files (ics/ icalendar) using Python

后端 未结 5 1488
感情败类
感情败类 2020-11-29 00:30

I have a .ics file in the following format. What is the best way to parse it? I need to retrieve the Summary, Description, and Time for each of the entries.

         


        
5条回答
  •  星月不相逢
    2020-11-29 00:54

    Four years later and understanding ICS format a bit better, if those were the only fields I needed, I'd just use the native string methods:

    import io
    
    # Probably not a valid .ics file, but we don't really care for the example
    # it works fine regardless
    file = io.StringIO('''
    BEGIN:VCALENDAR
    X-LOTUS-CHARSET:UTF-8
    VERSION:2.0
    DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n
    
    SUMMARY:smart energy management
    LOCATION:8778/92050462
    DTSTART;TZID="India":20100629T110000
    DTEND;TZID="India":20100629T120000
    TRANSP:OPAQUE
    DTSTAMP:20100713T071037Z
    CLASS:PUBLIC
    SUMMARY:meeting
    UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
    X-LOTUS-UPDATE-SEQ:1
    X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
    X-LOTUS-NOTESVERSION:2
    X-LOTUS-APPTTYPE:0
    X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
    END:VEVENT
    '''.strip())
    
    parsing = False
    for line in file:
        field, _, data = line.partition(':')
        if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):
            parsing = True
            print(field)
            print('\t'+'\n\t'.join(data.split('\n')))
        elif parsing and not data:
            print('\t'+'\n\t'.join(field.split('\n')))
        else:
            parsing = False
    

    Storing the data and parsing the datetime is left as an exercise for the reader (it's always UTC)

    old answer below


    You could use a regex:

    import re
    text = #your text
    print(re.search("SUMMARY:.*?:", text, re.DOTALL).group())
    print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group())
    print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group())
    

    I'm sure it may be possible to skip the first and last words, I'm just not sure how to do it with regex. You could do it this way though:

    print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1])
    

提交回复
热议问题