Parsing Google News RSS with PHP

泪湿孤枕 提交于 2019-12-29 05:00:07

问题


I want to parse Google News rss with PHP. I managed to run this code:

<?
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');

foreach($news->channel->item as $item) {
    echo "<strong>" . $item->title . "</strong><br />";
    echo strip_tags($item->description) ."<br /><br />";
}
?>

However, I'm unable to solve following problems. For example:

  1. How can i get the hyperlink of the news title?
  2. As each of the Google news has many related news links in footer, (and my code above includes them also). How can I remove those from the description?
  3. How can i get the image of each news also? (Google displays a thumbnail image of each news)

Thanks.


回答1:


There we go, just what you need for your particular situation:

<?php
$news = simplexml_load_file('http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&topic=n&output=rss');

$feeds = array();

$i = 0;

foreach ($news->channel->item as $item) 
{
    preg_match('@src="([^"]+)"@', $item->description, $match);
    $parts = explode('<font size="-1">', $item->description);

    $feeds[$i]['title'] = (string) $item->title;
    $feeds[$i]['link'] = (string) $item->link;
    $feeds[$i]['image'] = $match[1];
    $feeds[$i]['site_title'] = strip_tags($parts[1]);
    $feeds[$i]['story'] = strip_tags($parts[2]);

    $i++;
}

echo '<pre>';
print_r($feeds);
echo '</pre>';
?>

And the output should look like this:

[2] => Array
        (
            [title] => Los Alamos Nuclear Lab Under Siege From Wildfire - ABC News
            [link] => http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGxBe4YsZArH0kSwEjq_zDm_h-N4A&url=http://abcnews.go.com/Technology/wireStory?id%3D13951623
            [image] => http://nt2.ggpht.com/news/tbn/OhH43xORRwiW1M/6.jpg
            [site_title] => ABC News
            [story] => A wildfire burning near the desert birthplace of the atomic bomb advanced on the Los Alamos laboratory and thousands of outdoor drums of plutonium-contaminated waste Tuesday as authorities stepped up ...
        )



回答2:


I'd recommend checking out SimplePie. I've used it for several different projects and it works great (and abstracts away all of the headache you're currently dealing with).

Now, if you're writing this code simply because you want to learn how to do it, you should probably ignore this answer. :)




回答3:


  1. To get the URL for a news item, use $item->link.
  2. If there's a common delimiter for the related news links, you could use regex to cut off everything after it.
  3. Google puts the thumbnail image HTML code inside the description field of the feed. You could regex out everything between the open and close brackets for the image declaration to get the HTML for it.


来源:https://stackoverflow.com/questions/6512318/parsing-google-news-rss-with-php

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!