Parsing files (ics/ icalendar) using Python

匿名 (未验证) 提交于 2019-12-03 01:23:02

问题:

I have a .ics file in the following format. What is the best way to parse it? I need to retrieve the Summary, Description, and Time for each of the entries.

BEGIN:VCALENDAR X-LOTUS-CHARSET:UTF-8 VERSION:2.0 PRODID:-//Lotus Development Corporation//NONSGML Notes 8.0//EN METHOD:PUBLISH BEGIN:VTIMEZONE TZID:India BEGIN:STANDARD DTSTART:19500101T020000 TZOFFSETFROM:+0530 TZOFFSETTO:+0530 END:STANDARD END:VTIMEZONE BEGIN:VEVENT DTSTART;TZID="India":20100615T111500 DTEND;TZID="India":20100615T121500 TRANSP:OPAQUE DTSTAMP:20100713T071035Z CLASS:PUBLIC DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n  UID:12D3901F0AD9E83E65257743001F2C9A-Lotus_Notes_Generated X-LOTUS-UPDATE-SEQ:1 X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1 X-LOTUS-NOTESVERSION:2 X-LOTUS-APPTTYPE:0 X-LOTUS-CHILD_UID:12D3901F0AD9E83E65257743001F2C9A END:VEVENT BEGIN:VEVENT DTSTART;TZID="India":20100628T130000 DTEND;TZID="India":20100628T133000 TRANSP:OPAQUE DTSTAMP:20100628T055408Z CLASS:PUBLIC DESCRIPTION: SUMMARY:smart energy management LOCATION:8778/92050462 UID:07F96A3F1C9547366525775000203D96-Lotus_Notes_Generated X-LOTUS-UPDATE-SEQ:1 X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1 X-LOTUS-NOTESVERSION:2 X-LOTUS-NOTICETYPE:A X-LOTUS-APPTTYPE:3 X-LOTUS-CHILD_UID:07F96A3F1C9547366525775000203D96 END:VEVENT BEGIN:VEVENT DTSTART;TZID="India":20100629T110000 DTEND;TZID="India":20100629T120000 TRANSP:OPAQUE DTSTAMP:20100713T071037Z CLASS:PUBLIC SUMMARY:meeting UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated X-LOTUS-UPDATE-SEQ:1 X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1 X-LOTUS-NOTESVERSION:2 X-LOTUS-APPTTYPE:0 X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B END:VEVENT 

回答1:

The icalendar package looks nice.

For instance, to write a file:

from icalendar import Calendar, Event from datetime import datetime from pytz import UTC # timezone  cal = Calendar() cal.add('prodid', '-//My calendar product//mxm.dk//') cal.add('version', '2.0')  event = Event() event.add('summary', 'Python meeting about calendaring') event.add('dtstart', datetime(2005,4,4,8,0,0,tzinfo=UTC)) event.add('dtend', datetime(2005,4,4,10,0,0,tzinfo=UTC)) event.add('dtstamp', datetime(2005,4,4,0,10,0,tzinfo=UTC)) event['uid'] = '20050115T101010/27346262376@mxm.dk' event.add('priority', 5)  cal.add_component(event)  f = open('example.ics', 'wb') f.write(cal.to_ical()) f.close() 

Tadaaa, you get this file:

BEGIN:VCALENDAR PRODID:-//My calendar product//mxm.dk// VERSION:2.0 BEGIN:VEVENT DTEND;VALUE=DATE:20050404T100000Z DTSTAMP;VALUE=DATE:20050404T001000Z DTSTART;VALUE=DATE:20050404T080000Z PRIORITY:5 SUMMARY:Python meeting about calendaring UID:20050115T101010/27346262376@mxm.dk END:VEVENT END:VCALENDAR 

But what lies in this file?

g = open('example.ics','rb') gcal = Calendar.from_ical(g.read()) for component in gcal.walk():     print component.name g.close() 

You can see it easily:

>>>  VCALENDAR VEVENT >>>  

What about parsing the data about the events:

g = open('example.ics','rb') gcal = Calendar.from_ical(g.read()) for component in gcal.walk():     if component.name == "VEVENT":         print(component.get('summary'))         print(component.get('dtstart'))         print(component.get('dtend'))         print(component.get('dtstamp')) g.close() 

Now you get:

>>>  Python meeting about calendaring 20050404T080000Z 20050404T100000Z 20050404T001000Z >>>  


回答2:

You could probably also use the vobject module for this: http://pypi.python.org/pypi/vobject

If you have a sample.ics file you can read it's contents like, so:

# read the data from the file data = open("sample.ics").read()  # parse the top-level event with vobject cal = vobject.readOne(data)  # Get Summary print 'Summary: ', cal.vevent.summary.valueRepr() # Get Description print 'Description: ', cal.vevent.description.valueRepr()  # Get Time print 'Time (as a datetime object): ', cal.vevent.dtstart.value print 'Time (as a string): ', cal.vevent.dtstart.valueRepr() 


回答3:

You can also use this new Python Package: http://packages.python.org/pyICSParser/

It parses the file and converts into a Python Array for easy processing.



回答4:

Four years later and understanding ICS format a bit better, if those were the only fields I needed, I'd just use the native string methods:

import io  # Probably not a valid .ics file, but we don't really care for the example # it works fine regardless file = io.StringIO(''' BEGIN:VCALENDAR X-LOTUS-CHARSET:UTF-8 VERSION:2.0 DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n  SUMMARY:smart energy management LOCATION:8778/92050462 DTSTART;TZID="India":20100629T110000 DTEND;TZID="India":20100629T120000 TRANSP:OPAQUE DTSTAMP:20100713T071037Z CLASS:PUBLIC SUMMARY:meeting UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated X-LOTUS-UPDATE-SEQ:1 X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1 X-LOTUS-NOTESVERSION:2 X-LOTUS-APPTTYPE:0 X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B END:VEVENT '''.strip())  parsing = False for line in file:     field, _, data = line.partition(':')     if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):         parsing = True         print(field)         print('\t'+'\n\t'.join(data.split('\n')))     elif parsing and not data:         print('\t'+'\n\t'.join(field.split('\n')))     else:         parsing = False 

Storing the data and parsing the datetime is left as an exercise for the reader (it's always UTC)

old answer below


You could use a regex:

import re text = #your text print(re.search("SUMMARY:.*?:", text, re.DOTALL).group()) print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group()) print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group()) 

I'm sure it may be possible to skip the first and last words, I'm just not sure how to do it with regex. You could do it this way though:

print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1]) 


回答5:

I'd parse line by line and do a search for your terms, then get the index and extract that and X number of characters further (however many you think you'll need). Then parse that much smaller string to get it to be what you need.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!