Exceptions with DateTime parsing in RSS feed use SyndicationFeed in c#

随声附和 提交于 2019-11-30 14:01:57

问题


I'm trying to parse Rss2, Atom feeds using SyndicationFeed objects. But I'm getting XmlExceptions while parsing DateTime field like pubDate

2012-01-17 08:01:06

public static List<SyndicationItem> getRssData(string url)
{
    List<SyndicationItem> list = new List<SyndicationItem>();

    WebClient client = new WebClient();
    try
    {
        SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
        list = (from item in feed.Items select item).ToList();
    }
    catch (Exception e)
    {
        throw e;
    }

    return list;
}

The url link http://news.163.com/special/00011K6L/rss_newstop.xml

<item id="2">
    <title>...</title>
    <link>...</link>
    <description>......</description>
    <pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>

Is there a better way to achieve this? Please help. Thanks.


回答1:


There is a workaround RSS20FeedFormatter throws exception trying to read some DateTime formats.

To work around this problem, create a custom XML reader that recognizes different date formats. The following is an example of a custom XML reader:

XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();

..and class used in previous code:

class MyXmlReader : XmlTextReader
{
    private bool readingDate = false;
    const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009

    public MyXmlReader(Stream s) : base(s) { }

    public MyXmlReader(string inputUri) : base(inputUri) { }

    public override void ReadStartElement()
    {
        if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
            (string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
            string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
        {
            readingDate = true;
        }
        base.ReadStartElement();
    }

    public override void ReadEndElement()
    {
        if (readingDate)
        {
            readingDate = false;
        }
        base.ReadEndElement();
    }

    public override string ReadString()
    {
        if (readingDate)
        {
            string dateString = base.ReadString();
            DateTime dt;
            if(!DateTime.TryParse(dateString,out dt))
                dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
            return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
        }
        else
        {
            return base.ReadString();
        }
    }
}



回答2:


Basically, that RSS feed is invalid. If you look at the RSS 2.0 specification it states that:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

The string "2012-01-17 12:09:29" doesn't comply to the "Date and Time" part of RFC 822. It should be "17 01 2012 12:09:29" or something similar.



来源:https://stackoverflow.com/questions/8891047/exceptions-with-datetime-parsing-in-rss-feed-use-syndicationfeed-in-c-sharp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!