问题
I need to scrapping some data by python, because of some java codes in target page i could not work with twill and mechanize module of python so i need to run selenium module in my openshift server. so i want to know have could i setup selenium driver (Firefox , chrome ,...) in an openshift server via ssh . i installed selenium by :
$pip install selenium but have could i run :
when i run this:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.yahoo.com")
i get this error:
>>> browser = webdriver.Firefox()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/lib/openshift/53fa54a8e0b8cd1d3c000611/app-root/runtime/srv/python/
lib/python2.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 49, i
n __init__
self.binary = FirefoxBinary()
File "/var/lib/openshift/53fa54a8e0b8cd1d3c000611/app-root/runtime/srv/python/
lib/python2.7/site-packages/selenium/webdriver/firefox/firefox_binary.py", line
43, in __init__
self._start_cmd = self._get_firefox_start_cmd()
File "/var/lib/openshift/53fa54a8e0b8cd1d3c000611/app-root/runtime/srv/python/
lib/python2.7/site-packages/selenium/webdriver/firefox/firefox_binary.py", line
162, in _get_firefox_start_cmd
" Please specify the firefox binary location or install firefox")
RuntimeError: Could not find firefox in your system PATH. Please specify the fir
efox binary location or install firefox
>>> browser.get("http://www.yahoo.com")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'browser' is not defined
>>>
so i think i must install firefox in my server so have i do this?
sudo apt-get install firefox xvfb is not working in openshift servers!. so i edited installation instruction form (http://joelinoff.com/blog/?p=853) and make this code :
#!/bin/sh
# Change this to the last working Libs (may be you have to try and error)
if [ ! -z $OPENSHIFT_DIY_LOG_DIR ]; then
echo "$OPENSHIFT_LOG_DIR" > "$OPENSHIFT_HOMEDIR/.env/OPENSHIFT_DIY_LOG_DIR"
nohup OPENSHIFT_DIY_LOG_DIR2=${OPENSHIFT_LOG_DIR} > /dev/null 2>&1
echo $OPENSHIFT_DIY_LOG_DIR2
fi
# ========================================================
# Step 1. Download the archives.
# 1. firefox 15.0.1
# 2. java jre-7u7
# 3. flash 11.2
# ========================================================
mkdir $OPENSHIFT_HOMEDIR/app-root/runtime/srv
mkdir $OPENSHIFT_HOMEDIR/app-root/runtime/srv/firefox
firefox_dir=$OPENSHIFT_HOMEDIR/app-root/runtime/srv/firefox
mkdir $OPENSHIFT_HOMEDIR/app-root/runtime/tmp/
if [ ! -d "$OPENSHIFT_HOMEDIR/app-root/runtime/srv/siege/bin" ]; then
cd $OPENSHIFT_HOMEDIR/app-root/runtime/srv/firefox
mkdir repo
pushd repo
wget http://releases.mozilla.org/pub/mozilla.org/firefox/releases/15.0.1/linux-x86_64/en-US/firefox-15.0.1.tar.bz2
wget javadl.sun.com/webapps/download/AutoDL?BundleId=68236 -O jre-7u7-linux-x64.tar.gz
#wget http://fpdownload.macromedia.com/get/flashplayer/pdc/11.2.202.238/install_flash_player_11_linux.x86_64.tar.gz
wget ftp://priede.bf.lu.lv/pub/MultiVide/MacroMedia/x64/install_flash_player_11_linux.x86_64.tar.gz
popd
# ========================================================
# Step 2. Install in the rtf (release-to-field) directory.
# ========================================================
mkdir rtf
pushd rtf
tar jxf ../repo/firefox-15.0.1.tar.bz2
tar zxf ../repo/jre-7u7-linux-x64.tar.gz
mkdir -p firefox/plugins
pushd firefox/plugins
tar zxf ${firefox_dir}/repo/install_flash_player_11_linux.x86_64.tar.gz
# This installs the java plugin.
ln -s ${firefox_dir}/rtf/jre1.7.0_07/lib/amd64/libnpjp2.so .
popd
popd
# ========================================================
# Step 3. Create a run script.
# ========================================================
cat >rtf/run.sh <<EOF
#!/bin/bash
MYARGS="\$*"
export PATH="${firefox_dir}/rtf/firefox:$rtfdir/jre1.7.0_07/bin:\${PATH}"
export CLASSPATH="${firefox_dir}/rtf/jre1.7.0_07/lib:\${CLASSPATH}"
firefox \$MYARGS
EOF
chmod a+x rtf/run.sh
# ========================================================
# Now you can run it as shown below.
# I added flash and java test URLs to make sure that it
# was working.
# ========================================================
fi
echo "*****************************"
echo "*** USAGE ***"
echo "{firefox_dir}/rtf/run.sh http://www.adobe.com/software/flash/about/ http://javatester.org/"
${firefox_dir}/rtf/run.sh http://www.adobe.com/software/flash/about/ http://javatester.org/
echo "*****************************"
echo "*****************************"
echo "*** F I N I S H E D !! ***"
echo "*****************************"
but when i run it by:
${firefox_dir}/rtf/run.sh http://www.adobe.com/software/flash/about/ http://javatester.org/
i get error :
Error: no display specified
so what i must to do !?
Thanks a lot .
回答1:
I have not used sellinum but I have successfully used phantomjs and casperjs on my OPENSHIFT app for web scrapping. Phantomjs is a true headless browser. Using casperjs on top of it makes it easy. They have good docs also.
来源:https://stackoverflow.com/questions/25910106/python-selenium-in-openshift-server