Running selenium browser on server (Flask/Python/Heroku)

前端 未结 2 635
情深已故
情深已故 2020-12-29 08:28

I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff

相关标签:
2条回答
  • There are buildpacks to make selenium work on heroku.

    Add below buildpacks.

    1) heroku buildpacks:add https://github.com/kevinsawicki/heroku-buildpack-xvfb-google-chrome/
    2) heroku buildpacks:add https://github.com/heroku/heroku-buildpack-chromedriver
    

    And set heroku stack to cedar-14 as shown below, as xvfb buildpack works only with cedar-14.

    heroku stack:set cedar-14 -a stocksdata
    

    Then point the google chrome location as below

    options = ChromeOptions()
    options.binary_location = "/app/.apt/usr/bin/google-chrome-stable"
    driver = webdriver.Chrome(chrome_options=options)
    
    0 讨论(0)
  • 2020-12-29 09:15

    Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.

    What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.

    You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.

    0 讨论(0)
提交回复
热议问题