I\'m trying to get some data from a page, but it\'s returning the error [403 Forbidden].
I thought it was the user agent, but I tri
These all headers I can see for a generic GET
request that are included by the browser:
Host: <URL>
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Try to include all those incrementally in your request (1 by 1) in order to identify which one(s) is/are required for a successful request.
On the other hand, take look of the tabs: Cookies
and/or Security
available in your browser console / developer tools under Network
option.
The site could be using anything in the request to trigger the rejection.
So, copy all headers from the request that your browser makes. Then delete them one by one1 to find out which are essential.
As per Python requests. 403 Forbidden, to add custom headers to the request, do:
result = requests.get(url, headers={'header':'value', <etc>})
1A faster way would be to delete half of them each time instead but that's more complicated since there are probably multiple essential headers