Requests using Beautiful Soup gets blocked

后端 未结 2 670
独厮守ぢ
独厮守ぢ 2020-12-09 23:16

When I make requests using Beautiful Soup, I get blocked as a "bot".

import requests
from bs4 import BeautifulSoup

reddit1Link = requests.get("         


        
2条回答
  •  不思量自难忘°
    2020-12-09 23:37

    I used to use Mechanize for stuff like this, it has been a couple of years, but it should still work.

    Try something like this:

    from mechanize import Browser
    from bs4 import BeautifulSoup
    
    b = Browser()
    b.set_handle_robots(False)
    b.addheaders = [('Referer', 'https://www.reddit.com'), ('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
    
    b.open('https://www.reddit.com/r/tensorflow/comments/650p49/question_im_a_techy_35_year_old_and_i_think_ai_is/')
    soup = BeautifulSoup(b.response().read(), "html.parser")
    

    EDIT:

    I just realized that, sadly, mechanize is only availble for python 2.5-2.7, there are however, other options available. See Installing mechanize for python 3.4

提交回复
热议问题