URL forbidden 403 when using a tool but fine from browser

╄→гoц情女王★ 提交于 2021-02-07 10:28:10

问题


I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.

When I go to the image url on a browser it loads without a problem.

When I attempt to get the Header info via my code or via online tools it fails

An example URL is http://www.adorama.com/images/large/CHHB74P.JPG

As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code. I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.

It also fails to do a normal GET request via Hurl.it. Same 403 error.

If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.

And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue

Any idea what is required for me to download the image headers and not get the 403?


回答1:


same problem here.

Locally it works smoothly. Doing it from an AWS instance I get the very same problem.

I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.

My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.

Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.

HTTP request sent, awaiting response... 
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.



回答2:


I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2

wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33--  http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.0 403 Forbidden
  Server: AkamaiGHost
  Mime-Version: 1.0
  Content-Type: text/html
  Content-Length: 299
  Cache-Control: max-age=604800
  Date: Wed, 23 Mar 2016 08:42:33 GMT
  Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.

But from another Linux host it did work. Here is output

wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11--  http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.0 200 OK
  Content-Type: image/jpeg
  Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
  Server: Microsoft-IIS/8.5
  X-AspNet-Version: 2.0.50727
  X-Powered-By: ASP.NET
  ServerID: C01
  Content-Length: 15131
  Cache-Control: private, max-age=604800
  Date: Wed, 23 Mar 2016 08:43:11 GMT
  Connection: keep-alive
  Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
  P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d

100%[=====================================>] 15,131      --.-K/s   in 0s      

2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]

I would guess that the image provider is deliberately blocking requests from EC2 address ranges.

The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing




回答3:


Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check

  • Geoip, IP
  • Browser headers
  • User agents
  • plugin info
  • Browser fonts return

You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org

You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.




回答4:


I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:

Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:

  • Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
  • facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
  • https ://bar: okay
  • https://bar: 403

All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.

I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.



来源:https://stackoverflow.com/questions/36170821/url-forbidden-403-when-using-a-tool-but-fine-from-browser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!