Selenium works on AWS EC2 but not on AWS Lambda

问题

I've looked at and tried nearly every other post on this topic with no luck.

EC2

I'm using python 3.6 so I'm using the following AMI amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2 (see here). Once I SSH into my EC2, I download Chrome with:

sudo curl https://intoli.com/install-google-chrome.sh | bash
cp -r /opt/google/chrome/ /home/ec2-user/
google-chrome-stable --version
# Google Chrome 86.0.4240.198

And download and unzip the matching Chromedriver:

sudo wget https://chromedriver.storage.googleapis.com/86.0.4240.22/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip

I install python36 and selenium with:

sudo yum install python36 -y
sudo /usr/bin/pip-3.6 install selenium

Then run the script:

import os
import selenium
from selenium import webdriver

CURR_PATH = os.getcwd()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.binary_location = f"{CURR_PATH}/chrome/google-chrome"
driver = webdriver.Chrome(
    executable_path = f"{CURR_PATH}/chromedriver",
    chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
print(html)

This works

Lambda

I then zip my chromedriver and Chrome files:

mkdir tmp
mv chromedriver tmp
mv chrome tmp
cd tmp
zip -r9 ../chrome.zip chromedriver chrome

And copy the zipped file to an S3 bucket

This is my lambda function:

import os
import boto3
from botocore.exceptions import ClientError
import zipfile
import selenium
from selenium import webdriver

s3 = boto3.resource('s3')

def handler(event, context):
    chrome_bucket = os.environ.get('CHROME_S3_BUCKET')
    chrome_key = os.environ.get('CHROME_S3_KEY')
    # DOWNLOAD HEADLESS CHROME FROM S3
    try:    
        # with open('/tmp/headless_chrome.zip', 'wb') as data:
        s3.meta.client.download_file(chrome_bucket, chrome_key, '/tmp/chrome.zip')
        print(os.listdir('/tmp'))
    except ClientError as e:
        raise e
    # UNZIP HEADLESS CHROME
    try:
        with zipfile.ZipFile('/tmp/chrome.zip', 'r') as zip_ref:
            zip_ref.extractall('/tmp')
        # FREE UP SPACE
        os.remove('/tmp/chrome.zip')
        print(os.listdir('/tmp'))
    except:
        raise ValueError('Problem with unzipping Chrome executable')
    # CHANGE PERMISSION OF CHROME
    try:
        os.chmod('/tmp/chromedriver', 0o775)
        os.chmod('/tmp/chrome/chrome', 0o775)
        os.chmod('/tmp/chrome/google-chrome', 0o775)
    except:
        raise ValueError('Problem with changing permissions to Chrome executable')
    # GET LINKS
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--window-size=1280x1696')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--remote-debugging-port=9222')
    chrome_options.binary_location = "/tmp/chrome/google-chrome"
    driver = webdriver.Chrome(
        executable_path = "/tmp/chromedriver",
        chrome_options=chrome_options
    )
    driver.get("https://www.google.com/")
    html = driver.page_source
    print(html)

I'm able to see my unzipped files in the /tmp path.

And my error:

{
  "errorMessage": "Message: unknown error: unable to discover open pages\n",
  "errorType": "WebDriverException",
  "stackTrace": [
    [
      "/var/task/lib/observer.py",
      69,
      "handler",
      "chrome_options=chrome_options"
    ],
    [
      "/var/task/selenium/webdriver/chrome/webdriver.py",
      81,
      "__init__",
      "desired_capabilities=desired_capabilities)"
    ],
    [
      "/var/task/selenium/webdriver/remote/webdriver.py",
      157,
      "__init__",
      "self.start_session(capabilities, browser_profile)"
    ],
    [
      "/var/task/selenium/webdriver/remote/webdriver.py",
      252,
      "start_session",
      "response = self.execute(Command.NEW_SESSION, parameters)"
    ],
    [
      "/var/task/selenium/webdriver/remote/webdriver.py",
      321,
      "execute",
      "self.error_handler.check_response(response)"
    ],
    [
      "/var/task/selenium/webdriver/remote/errorhandler.py",
      242,
      "check_response",
      "raise exception_class(message, screen, stacktrace)"
    ]
  ]
}

EDIT: I am willing to try out anything at this point. Different versions of Chrome or Chromium, Chromedriver, Python or Selenium.

回答1:

This error message...

"errorMessage": "Message: unknown error: unable to discover open pages\n",
"errorType": "WebDriverException"

...implies that the ChromeDriver was unable to initiate/spawn a new Browsing Context i.e. Chrome Browser session.

It seems the issue is with ChromeDriver,s security feature of Sandboxing.

Thumb rule

A common cause for Chrome to crash during startup is running Chrome as root user (administrator) on Linux. While it is possible to work around this issue by passing --no-sandbox flag when creating your WebDriver session, such a configuration is unsupported and highly discouraged. You need to configure your environment to run Chrome as a regular user instead.

Details

A bit of more details about your usecase would have helped us to analyze the usage of the arguments which you have used and the root cause of the error in a better way. However, a few thoughts:

What is the sandbox?: The sandbox is a C++ library that allows the creation of sandboxed processes — processes that execute within a very restrictive environment. The only resources sandboxed processes can freely use are CPU cycles and memory. For example, sandboxes processes cannot write to disk or display their own windows. What exactly they can do is controlled by an explicit policy. Chromium renderers are sandboxed processes.
What does and doesn't it protect against?: The sandbox limits the severity of bugs in code running inside the sandbox. Such bugs cannot install persistent malware in the user‘s account (because writing to the filesystem is banned). Such bugs also cannot read and steal arbitrary files from the user’s machine. (In Chromium, the renderer processes are sandboxed and have this protection. After the NPAPI removal, all remaining plugins are also sandboxed. Also note that Chromium renderer processes are isolated from the system, but not yet from the web. Therefore, domain-based data isolation is not yet provided.). The sandbox cannot provide any protection against bugs in system components such as the kernel it is running on.
So how can a sandboxed process such as a renderer accomplish anything?: Certain communication channels are explicitly open for the sandboxed processes; the processes can write and read from these channels. A more privileged process can use these channels to do certain actions on behalf of the sandboxed process. In Chromium, the privileged process is usually the browser process.

So you may need to drop the --no-sandbox option. Here is the link to the Sandbox story.

Additional Considerations

Some more considerations:

While using --headless option you won't be able to use --window-size=1280x1696 due to certain constraints.

You can find a couple of relevant detailed discussion in:

Fullscreen in Headless Chrome using Selenium

Not able to maximize Chrome Window in headless mode

The argument --disable-gpu was to enable google-chrome-headless on windows platform. It was needed as SwiftShader fails an assert on Windows in headless mode earlier. This issue was resolved through Headless: make --disable-gpu flag unnecessary

You can find a relevant detailed discussion in ERROR:gpu_process_transport_factory.cc(1007)-Lost UI shared context : while initializing Chrome browser through ChromeDriver in Headless mode

Further you haven't mentioned any specific requirement of using --disable-dev-shm-usage, --hide-scrollbars, --enable-logging, --log-level=0, --v=99, --single-process and --remote-debugging-port=9222 arguments which you opt to drop for the time being and add them back as per your Test Specification.

References

You can find a couple of relevant detailed discussion in:

selenium.common.exceptions.WebDriverException: Message: unknown error: unable to discover open pages using ChromeDriver through Selenium
WebDriverException: unknown error: unable to discover open pages error with ChromeDriver 80.0.3987.106 and Chrome 80.0.3987.122

来源：https://stackoverflow.com/questions/64856328/selenium-works-on-aws-ec2-but-not-on-aws-lambda

标签

python

amazon-web-services

selenium

selenium-webdriver