Why should I access a url using a User Agent?

孤者浪人 提交于 2019-12-06 16:25:09

Many web administrators want to prevent bots from accessing their sites because what they do is scrape data at regular intervals but the owner can't earn any ad revenue from these hits. So no obvious benefits but they keep using resources. For this reason they block anything that doesn't look like a browser used by a human. As you have seen, it is completely trivial to make your program pretend to be another. So this technique is not effective against anyone who knows what they are doing. In general though, it is considered polite to not pretend something you're not (internet etiquette).

User agent strings can technically be anything you want, but most applications follow a common pattern such as $product/$version. You can see some examples here.

For more information, check out the wikipedia article on the matter.


So quick summary:

  1. You should use it because the servers expect all clients to have one
  2. The library probably has a default user agent (eg. JavaLib/1.1), but you had to set your own for the reasons stated above.
  3. Not necessary for all programs, but pretending to be a browser is useful for bots. Just remember that it is considered impolite. For example wget works 99% of the time for me without modification, but some sites block its user agent.
  4. The string is not generated, it's just copied from an existing browser, IE 6.0 in this case. And the server you're connecting to seems to accept it.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!