How to identify curl request

问题

Is there a way to detect in my script whether the request is coming from normal web browser or some script executing curl. I can see the headers and can distinguish with "User-Agent and other few headers" but in curl fake headers can be set, so i am not able to track the request.

Please suggest me ways about identifying the curl or other similar non browser request.

回答1:

The only way to catch most "automated" requests is to code in logic that spots activity that couldn't possibly be human with a browser.

For example, hitting pages too fast, filling out a form too fast, have an external source in the html file (like a fake css file through a php file), and check to see if the requesting IP has downloaded it in the previous stage of your site (kind of like a reverse honeypot), but you would need to exclude certain IP's/user agents from being blocked, otherwise you'll block google's webspiders. etc.

This is probably the only way of doing it if curl (or any other automated script) is faking its headers to look like a browser.

回答2:

Strictly speaking, there is no way.
Although there are non-direct techiques, but I would never discuss it in public, especially on a site like Stackoverflow, which encourage screen scraping, content swiping autoposting and all this dirty roboting stuff.

In some cases you can use CAPTCHA test to tell a human from a bot.