问题
I need to detect search engines that refers to my website. Since every search engine has different query strings for searching(e.g. google uses 'q=', yahoo uses 'p=') I created a database for search engines with their url regex patterns.
As an example: http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-GB:official&client=firefox-a
the regex I created for google is:
(http:)(\\/)(\\/)(www)(\\.)(google)(\\.).*(\\/)(search).*(&q=|\\?q=).*
(I am a newbie on regex, but so far it works)
This detects that the url belongs to Google. My problem is that I need to extract the search words from the url above or from other search engines. But I dont know how to match it with the regular expression. I have tried extracting the query string from the url by using PHP functions and match it against the pattern, but it returned nothing.
Hope I could explain this clear enough.
Any suggestion?
回答1:
This blog entry about extracting keywords from the referrer seems like it is a good match for solving your problem.
I found it using this search for 'extract query string from google referer url'. The search seems to have a number of helpful hits... I just did a sweep of the first few.
回答2:
I would use parse_url to parse the URL and parse_str to parse the URL query.
$url = 'http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu%3Aen-GB%3Aofficial&client=firefox-a';
$parts = parse_url($url);
if (isset($parts['query'])) {
parse_str($parts['query'], $parts['query']);
}
var_dump($parts);
来源:https://stackoverflow.com/questions/1963883/regular-expression-to-detect-the-search-engine-and-search-words