php crawler detection

大兔子大兔子 提交于 2019-12-10 10:56:16

问题


I'm trying to write a sitemap.php which acts differently depending on who is looking.

I want to redirect crawlers to my sitemap.xml, as that will be the most updated page and will contain all the info they need, but I want my regular readers to be show a html sitemap on the php page.

This will all be controlled from within the php header, and I've found this code on the web which by the looks of it should work, but it's not. Can anyone help crack this for me?

function getIsCrawler($userAgent) {
    $crawlers = 'firefox|Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' .
    'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' .
    'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';
    $isCrawler = (preg_match("/$crawlers/i", $userAgent) > 0);
    return $isCrawler;
}

$iscrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

if ($isCrawler) {
    header('Location: http://www.website.com/sitemap.xml');
    exit;
} else {
    echo "not crawler!";
}

It looks pretty simple, but as you can see i've added firefox into the agent list, and sure enough I'm not being redirected..

Thanks for any help :)


回答1:


You have a mistake in your code:

$crawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

should be

$isCrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

If you develop with notices on you'll catch these errors much more easily.

Also, you probable want to exit after the header

Warning: Cloaking can get you in trouble with search providers. This article explains why.




回答2:


http://develobert.blogspot.com/2008/11/php-robot-check.html



来源:https://stackoverflow.com/questions/1176727/php-crawler-detection

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!