问题
I've looked at and tried nearly every other post on this topic with no luck.
EC2
I'm using python 3.6
so I'm using the following AMI amzn-ami-hvm-2018.03.0.20181129-x86_64-gp2
(see here). Once I SSH into my EC2, I download Chrome with:
sudo curl https://intoli.com/install-google-chrome.sh | bash
cp -r /opt/google/chrome/ /home/ec2-user/
google-chrome-stable --version
# Google Chrome 86.0.4240.198
And download and unzip the matching Chromedriver:
sudo wget https://chromedriver.storage.googleapis.com/86.0.4240.22/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip
I install python36
and selenium
with:
sudo yum install python36 -y
sudo /usr/bin/pip-3.6 install selenium
Then run the script:
import os
import selenium
from selenium import webdriver
CURR_PATH = os.getcwd()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.binary_location = f"{CURR_PATH}/chrome/google-chrome"
driver = webdriver.Chrome(
executable_path = f"{CURR_PATH}/chromedriver",
chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
print(html)
This works
Lambda
I then zip my chromedriver and Chrome files:
mkdir tmp
mv chromedriver tmp
mv chrome tmp
cd tmp
zip -r9 ../chrome.zip chromedriver chrome
And copy the zipped file to an S3
bucket
This is my lambda function:
import os
import boto3
from botocore.exceptions import ClientError
import zipfile
import selenium
from selenium import webdriver
s3 = boto3.resource('s3')
def handler(event, context):
chrome_bucket = os.environ.get('CHROME_S3_BUCKET')
chrome_key = os.environ.get('CHROME_S3_KEY')
# DOWNLOAD HEADLESS CHROME FROM S3
try:
# with open('/tmp/headless_chrome.zip', 'wb') as data:
s3.meta.client.download_file(chrome_bucket, chrome_key, '/tmp/chrome.zip')
print(os.listdir('/tmp'))
except ClientError as e:
raise e
# UNZIP HEADLESS CHROME
try:
with zipfile.ZipFile('/tmp/chrome.zip', 'r') as zip_ref:
zip_ref.extractall('/tmp')
# FREE UP SPACE
os.remove('/tmp/chrome.zip')
print(os.listdir('/tmp'))
except:
raise ValueError('Problem with unzipping Chrome executable')
# CHANGE PERMISSION OF CHROME
try:
os.chmod('/tmp/chromedriver', 0o775)
os.chmod('/tmp/chrome/chrome', 0o775)
os.chmod('/tmp/chrome/google-chrome', 0o775)
except:
raise ValueError('Problem with changing permissions to Chrome executable')
# GET LINKS
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.add_argument('--v=99')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.binary_location = "/tmp/chrome/google-chrome"
driver = webdriver.Chrome(
executable_path = "/tmp/chromedriver",
chrome_options=chrome_options
)
driver.get("https://www.google.com/")
html = driver.page_source
print(html)
I'm able to see my unzipped files in the /tmp
path.
And my error:
{
"errorMessage": "Message: unknown error: unable to discover open pages\n",
"errorType": "WebDriverException",
"stackTrace": [
[
"/var/task/lib/observer.py",
69,
"handler",
"chrome_options=chrome_options"
],
[
"/var/task/selenium/webdriver/chrome/webdriver.py",
81,
"__init__",
"desired_capabilities=desired_capabilities)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
157,
"__init__",
"self.start_session(capabilities, browser_profile)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
252,
"start_session",
"response = self.execute(Command.NEW_SESSION, parameters)"
],
[
"/var/task/selenium/webdriver/remote/webdriver.py",
321,
"execute",
"self.error_handler.check_response(response)"
],
[
"/var/task/selenium/webdriver/remote/errorhandler.py",
242,
"check_response",
"raise exception_class(message, screen, stacktrace)"
]
]
}
EDIT: I am willing to try out anything at this point. Different versions of Chrome or Chromium, Chromedriver, Python or Selenium.
回答1:
This error message...
"errorMessage": "Message: unknown error: unable to discover open pages\n",
"errorType": "WebDriverException"
...implies that the ChromeDriver was unable to initiate/spawn a new Browsing Context i.e. Chrome Browser session.
It seems the issue is with ChromeDriver,s security feature of Sandboxing.
Thumb rule
A common cause for Chrome to crash during startup is running Chrome as
root
user (administrator
) on Linux. While it is possible to work around this issue by passing--no-sandbox
flag when creating your WebDriver session, such a configuration is unsupported and highly discouraged. You need to configure your environment to run Chrome as a regular user instead.
Details
A bit of more details about your usecase would have helped us to analyze the usage of the arguments which you have used and the root cause of the error in a better way. However, a few thoughts:
- What is the sandbox?: The sandbox is a C++ library that allows the creation of sandboxed processes — processes that execute within a very restrictive environment. The only resources sandboxed processes can freely use are CPU cycles and memory. For example, sandboxes processes cannot write to disk or display their own windows. What exactly they can do is controlled by an explicit policy. Chromium renderers are sandboxed processes.
- What does and doesn't it protect against?: The sandbox limits the severity of bugs in code running inside the sandbox. Such bugs cannot install persistent malware in the user‘s account (because writing to the filesystem is banned). Such bugs also cannot read and steal arbitrary files from the user’s machine. (In Chromium, the renderer processes are sandboxed and have this protection. After the NPAPI removal, all remaining plugins are also sandboxed. Also note that Chromium renderer processes are isolated from the system, but not yet from the web. Therefore, domain-based data isolation is not yet provided.). The sandbox cannot provide any protection against bugs in system components such as the kernel it is running on.
- So how can a sandboxed process such as a renderer accomplish anything?: Certain communication channels are explicitly open for the sandboxed processes; the processes can write and read from these channels. A more privileged process can use these channels to do certain actions on behalf of the sandboxed process. In Chromium, the privileged process is usually the browser process.
So you may need to drop the --no-sandbox
option. Here is the link to the Sandbox story.
Additional Considerations
Some more considerations:
- While using
--headless
option you won't be able to use--window-size=1280x1696
due to certain constraints.
You can find a couple of relevant detailed discussion in:
- Fullscreen in Headless Chrome using Selenium
- Not able to maximize Chrome Window in headless mode
- The argument
--disable-gpu
was to enable google-chrome-headless on windows platform. It was needed as SwiftShader fails an assert on Windows in headless mode earlier. This issue was resolved through Headless: make --disable-gpu flag unnecessary
You can find a relevant detailed discussion in ERROR:gpu_process_transport_factory.cc(1007)-Lost UI shared context : while initializing Chrome browser through ChromeDriver in Headless mode
- Further you haven't mentioned any specific requirement of using
--disable-dev-shm-usage
,--hide-scrollbars
,--enable-logging
,--log-level=0
,--v=99
,--single-process
and--remote-debugging-port=9222
arguments which you opt to drop for the time being and add them back as per your Test Specification.
References
You can find a couple of relevant detailed discussion in:
- selenium.common.exceptions.WebDriverException: Message: unknown error: unable to discover open pages using ChromeDriver through Selenium
- WebDriverException: unknown error: unable to discover open pages error with ChromeDriver 80.0.3987.106 and Chrome 80.0.3987.122
来源:https://stackoverflow.com/questions/64856328/selenium-works-on-aws-ec2-but-not-on-aws-lambda