Exceptions with DateTime parsing in RSS feed use SyndicationFeed in c#

后端 未结 2 871
没有蜡笔的小新
没有蜡笔的小新 2020-12-31 15:21

I\'m trying to parse Rss2, Atom feeds using SyndicationFeed objects. But I\'m getting XmlExceptions while parsing DateTime field like pubDate

2012-01-17 08:01:06

相关标签:
2条回答
  • 2020-12-31 15:56

    There is a workaround RSS20FeedFormatter throws exception trying to read some DateTime formats.

    To work around this problem, create a custom XML reader that recognizes different date formats. The following is an example of a custom XML reader:

    XmlReader r = new MyXmlReader(url);
    SyndicationFeed feed = SyndicationFeed.Load(r);
    Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
    XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
    rssWriter.Formatting = Formatting.Indented;
    rssFormatter.WriteTo(rssWriter);
    rssWriter.Close();
    

    ..and class used in previous code:

    class MyXmlReader : XmlTextReader
    {
        private bool readingDate = false;
        const string CustomUtcDateTimeFormat = "ffffd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009
    
        public MyXmlReader(Stream s) : base(s) { }
    
        public MyXmlReader(string inputUri) : base(inputUri) { }
    
        public override void ReadStartElement()
        {
            if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
                (string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
                string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
            {
                readingDate = true;
            }
            base.ReadStartElement();
        }
    
        public override void ReadEndElement()
        {
            if (readingDate)
            {
                readingDate = false;
            }
            base.ReadEndElement();
        }
    
        public override string ReadString()
        {
            if (readingDate)
            {
                string dateString = base.ReadString();
                DateTime dt;
                if(!DateTime.TryParse(dateString,out dt))
                    dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
                return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
            }
            else
            {
                return base.ReadString();
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-31 16:10

    Basically, that RSS feed is invalid. If you look at the RSS 2.0 specification it states that:

    All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

    The string "2012-01-17 12:09:29" doesn't comply to the "Date and Time" part of RFC 822. It should be "17 01 2012 12:09:29" or something similar.

    0 讨论(0)
提交回复
热议问题