c# regex parse file in ical format and populate object with results

ぐ巨炮叔叔 提交于 2019-12-21 02:47:35

问题


I'm trying to parse a file that has the following format:

BEGIN:VEVENT
CREATED:20120504T163940Z
DTEND;TZID=America/Chicago:20120504T130000
DTSTAMP:20120504T164000Z
DTSTART;TZID=America/Chicago:20120504T120000
LAST-MODIFIED:20120504T163940Z
SEQUENCE:0
SUMMARY:Test 1
TRANSP:OPAQUE
UID:21F61281-FB76-467F-A2CC-A666688BD9B5
X-RADICALE-NAME:21F61281-FB76-467F-A2CC-A666688BD9B5.ics
END:VEVENT

I need to take the values found after the colon or semi colon on each line and put them into props in an object. I'm attempting to do this with Regex, but I basically forget everything I know about Regex after I use it (which is maybe twice a year). Any help would be appreciated.


回答1:


Run this with a few examples and see if it does what you want. I get the other comments about splitting or IndexOf but if you're expecting that the delimiter is either a colon or a semicolon then a regex is probably better.

string line = "LAST-MODIFIED:20120504T163940Z";
var p = Regex.Match(line, "(.*)?(:|;)(.*)$", RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Singleline);
Console.WriteLine(p.Groups[0].Value);
Console.WriteLine(p.Groups[1].Value);
Console.WriteLine(p.Groups[2].Value);
Console.WriteLine(p.Groups[3].Value);



回答2:


edit

This post got me thinking about the iCal format.

Before yesterday, I didn't know what the iCal format was. But, after reading the 1998 spec, its painfully obvious than none of the answers on this page is adequate to parse the content. And, its really too sophisticated even for my general regex below.

With that in mind, here is a solution that parses just the line content, as gleaned from the spec for general line content parsing. Its a step in the right direction, and hopefully someone can benefit. It doesen't do line continuation and does not validate.

C# code

Regex iCalMainRx = new Regex(
 @" ^  (?<name> [^[:cntrl:]"";:,\n]+ )
       (?<parameter>
          ;
          (?<param_name> [^[:cntrl:]"";:,\n]+ )
           = 
          (?<param_value> 
             (?: (?:[^\S\n]|[^[:cntrl:]"";:,])*  | "" (?:[^\S\n]|[^[:cntrl:]""])* "" )
             (?: , (?: (?:[^\S\n]|[^[:cntrl:]"";:,])*  | "" (?:[^\S\n]|[^[:cntrl:]""])* "" ) )*
          )
        )*
        :
        (?<value> (?:[^\S\n]|[^[:cntrl:]])* )
     $ ", RegexOptions.IgnorePatternWhitespace);

Regex iCalPvalRx = new Regex(
 @" ^ (?<pvals> (?:[^\S\n]|[^[:cntrl:]"";:,])*  | "" (?:[^\S\n]|[^[:cntrl:]""])* "" )
      (?: ,+ (?<pvals> (?:[^\S\n]|[^[:cntrl:]"";:,])*  | "" (?:[^\S\n]|[^[:cntrl:]""])* "" ) )*
    $ ", RegexOptions.IgnorePatternWhitespace);


string[] lines = {
    "BEGIN:VEVENT", 
    "CREATED:20120504T163940Z", 
    "DTEND;TZID=America/Chicago:20120504T130000", 
    "DTSTAMP:20120504T164000Z", 
    "DTSTART;TZID=,,,America/Chicago;Next=;last=\"this:;;;:=\";final=:20120504T120000", 
    "LAST-MODIFIED:20120504T163940Z", 
    "SEQUENCE:0", 
    "SUMMARY:Test 1", 
    "TRANSP:OPAQUE", 
    "UID:21F61281-FB76-467F-A2CC-A666688BD9B5", 
    "X-RADICALE-NAME:21F61281-FB76-467F-A2CC-A666688BD9B5.ics", 
    "END:VEVENT", 
};

foreach (string str in lines)
{
    Match m_content = iCalMainRx.Match( str );
    if (m_content.Success)
    {
        Console.WriteLine("Key =   " + m_content.Groups["name"].Value);
        Console.WriteLine("Value = " + m_content.Groups["value"].Value);

        CaptureCollection cc_pname  = m_content.Groups["param_name"].Captures;
        CaptureCollection cc_pvalue = m_content.Groups["param_value"].Captures;
        if (cc_pname.Count > 0)
        {
            Console.WriteLine("Parameters: ");
            for (int i = 0; i < cc_pname.Count; i++)
            {
                // Console.WriteLine("\t'" + cc_pname[i].Value + "'  =   '" + cc_pvalue[i].Value + "'");
                Console.WriteLine("\t'" + cc_pname[i].Value + "' =");
                Match m_vals = iCalPvalRx.Match( cc_pvalue[i].Value );
                if (m_vals.Success)
                {
                    CaptureCollection cc_vals = m_vals.Groups["pvals"].Captures;
                    for (int j = 0; j < cc_vals.Count; j++)
                    {
                        Console.WriteLine("\t\t'" + cc_vals[j].Value + "'");
                    }
                }

            }
        }
        Console.WriteLine("-------------------------");
    }
}

Output

Key =   BEGIN
Value = VEVENT
-------------------------
Key =   CREATED
Value = 20120504T163940Z
-------------------------
Key =   DTEND
Value = 20120504T130000
Parameters:
        'TZID' =
                'America/Chicago'
-------------------------
Key =   DTSTAMP
Value = 20120504T164000Z
-------------------------
Key =   DTSTART
Value = 20120504T120000
Parameters:
        'TZID' =
                ''
                'America/Chicago'
        'Next' =
                ''
        'last' =
                '"this:;;;:="'
        'final' =
                ''
-------------------------
Key =   LAST-MODIFIED
Value = 20120504T163940Z
-------------------------
Key =   SEQUENCE
Value = 0
-------------------------
Key =   SUMMARY
Value = Test 1
-------------------------
Key =   TRANSP
Value = OPAQUE
-------------------------
Key =   UID
Value = 21F61281-FB76-467F-A2CC-A666688BD9B5
-------------------------
Key =   X-RADICALE-NAME
Value = 21F61281-FB76-467F-A2CC-A666688BD9B5.ics
-------------------------
Key =   END
Value = VEVENT
-------------------------



回答3:


Spiting into lines and use IndexOf(":") may be enough for simple ICAL files instead of RegEx.

Check out if there is already existing ICAL parser and related questions ical+C#.




回答4:


Try:

(?<key>[^:;]*)[:;](?<value>[^\s]*)

C# snippet:

Regex regex = new Regex(
@"(?<key>[^:;]*)[:;](?<value>[^\s]*)",
RegexOptions.None
);

It takes a string of any character but a colon or semicolon as the key, and then anything else but whitespace as the value.

If you want to test it or make changes, check out the regex checker I have on my blog: http://blog.stevekonves.com/2012/01/an-even-better-regex-tester/ (requires silverlight)




回答5:


I'd personally use string.Split(':') for this for each line in the file. This has the benefit of being easy to read and understand too if you don't want to re-learn regular expressions again!



来源:https://stackoverflow.com/questions/11038703/c-sharp-regex-parse-file-in-ical-format-and-populate-object-with-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!