Running Selenium on AWS Lambda

风流意气都作罢 提交于 2021-01-27 17:12:14

问题


I know this is a frequently asked question, I have checked many answers and tried everything but still can't find a solution.

I am trying to run Selenium with Python 3.6 on AWS Lambda and created the deployment package using Docker. I followed the following steps for Docker:

sudo docker run -v $(pwd):/outputs --name linked_in -d amazonlinux:latest tail -f /dev/null
sudo docker exec -i -t linked_in /bin/bash /outputs/buildPack_py.sh

This is what my buildPack_py.sh file looks like:


python_install (){

  wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tar.xz
  tar xJf Python-3.6.0.tar.xz
  cd Python-3.6.0

  ./configure
  make -j 5
  make install -j 5
  export PATH=/usr/local/bin/:$PATH
  cd ..
  rm Python-3.6.0.tar.xz
  rm -rf Python-3.6.0
}

dev_install () {
  yum -y update
  yum -y upgrade
  yum install -y \
  wget \
  curl \
  apt-get \
  gcc \
  gcc-c++ \
  findutils \
  zlib-devel \
  zip \
  xz \
  tar \
  make \
  openssl-devel \
  unzip \
  atlas atlas-devel lapack-devel blas-devel
#  curl https://intoli.com/install-google-chrome.sh | bash
#  mv /usr/bin/google-chrome /usr/bin/google-chrome-stable
  wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
  unzip chromedriver_linux64.zip
  curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
    unzip headless-chromium.zip -d bin/
}

install_packages () {
  cd /home/
    rm -rf env
  pip3 install virtualenv
  python3 -m virtualenv env --python=python3
  source env/bin/activate
  pip install datetime
  pip install requests
  pip install math
  pip install lxml
  pip install selenium
  pip install beautifulsoup4
  deactivate
}


gather_pack () {
  # packing
  cd /home/
    source env/bin/activate

  rm -rf lambdapack5
  mkdir lambdapack5
  cd lambdapack5

  cp -R /home/env/lib/python3.6/site-packages/* .
  cp -R /home/env/lib64/python3.6/site-packages/* .
  cp /outputs/linkedinScraper.py /home/lambdapack5/
  cp /chromedriver /home/lambdapack5/chromedriver
  cp /bin/headless-chromium /home/lambdapack5/headless-chromium
    echo "original size $(du -sh /home/lambdapack5 | cut -f1)"

  # cleaning libs
  rm -rf external
  #    find . -type d -name "tests" -exec rm -rf {} +

  # cleaning
  find -name "*.so" ! -name "_imaging.cpython-36m-x86_64-linux-gnu.so" | xargs strip
  #    find -name "*.so.*" | xargs strip
  find . -name test -type d -print0|xargs -0 rm -rf --
    rm -r pip
  rm -r pip-*
    rm -r wheel
  rm -r wheel-*
    rm easy_install.py
  find . -name \*.pyc -delete
  # find . -name \*.txt -delete
  echo "stripped size $(du -sh /home/lambdapack5 | cut -f1)"

  # compressing
  zip -FS -r9 /outputs/linkedinApi.zip * > /dev/null
  echo "compressed size $(du -sh /outputs/linkedinApi.zip | cut -f1)"
}

main () {
  dev_install
  python_install
  install_packages
  gather_pack
}

main

I am using the following versions:

chromedriver: 2.35 serverless-chromium: 1.0.0-37

The error I get after uploading the zip file to Lambda is:

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally

On going through other posts, I found the versions mentioned above that work well together. I also saw mention of Xvfb but is it really required if I am using headless browser.

This is a part of the Selenium code:

    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument("--no-sandbox")
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1280x1696')
    options.add_argument('--user-data-dir=/tmp/')
    options.add_argument('--hide-scrollbars')
    options.add_argument('--enable-logging')
    options.add_argument('--log-level=0')
    options.add_argument('--v=99')
    options.add_argument('--single-process')
    options.add_argument('--data-path=/tmp/')
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--homedir=/tmp/')
    options.add_argument('--disk-cache-dir=/tmp/')
    options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')
    options.binary_location = "headless-chromium"
    driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),options=options)

Any help is greatly greatly appreciated!

来源:https://stackoverflow.com/questions/58510438/running-selenium-on-aws-lambda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!