问题
I am a newbie to Scrapy and going through the tutorials. Ran this command and got some error.
C:\Users\Sandra\Anaconda>scrapy shell 'http://scrapy.org'
In particular what is this URLError: <urlopen error [Errno 10051] A socket operation was attempted to an unreachable network>
Full Error message:
2015-08-20 23:35:08 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-08-20 23:35:08 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-08-20 23:35:08 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2015-08-20 23:35:10 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2015-08-20 23:35:10 [boto] DEBUG: Retrieving credentials from metadata server.
2015-08-20 23:35:10 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "C:\Users\Sandra\Anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 449, in _open
'_open', req)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10051] A socket operation was attempted to an unreachable network>
2015-08-20 23:35:10 [boto] ERROR: Unable to read instance data, giving up
2015-08-20 23:35:10 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddlewar
e, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddlewar
e, ChunkedTransferMiddleware, DownloaderStats
2015-08-20 23:35:10 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthM
iddleware, DepthMiddleware
2015-08-20 23:35:10 [scrapy] INFO: Enabled item pipelines:
2015-08-20 23:35:10 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
Traceback (most recent call last):
File "C:\Users\Sandra\Anaconda\Scripts\scrapy-script.py", line 5, in <module>
sys.exit(execute())
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\commands\shell.py", line 63, in run
shell.start(url=url)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py", line 44, in start
self.fetch(url, spider)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py", line 81, in fetch
url = any_to_uri(request_or_url)
File "C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py", line 232, in any_to_uri
return uri_or_path if u.scheme else path_to_file_uri(uri_or_path)
File "C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py", line 213, in path_to_file_uri
x = moves.urllib.request.pathname2url(os.path.abspath(path))
File "C:\Users\Sandra\Anaconda\lib\nturl2path.py", line 58, in pathname2url
raise IOError, error
Error: Bad path: C:\Users\Sandra\Anaconda\'http:\scrapy.org'
Here is list of packages installed:
# packages in environment at C:\Users\Sandra\Anaconda:
#
_license 1.1 py27_0
alabaster 0.7.3 py27_0
anaconda 2.3.0 np19py27_0
argcomplete 0.8.9 py27_0
astropy 1.0.3 np19py27_0
babel 1.3 py27_0
backports.ssl-match-hostname 3.4.0.2
bcolz 0.9.0 np19py27_0
beautiful-soup 4.3.2 py27_1
beautifulsoup4 4.3.2
binstar 0.11.0 py27_0
bitarray 0.8.1 py27_1
blaze 0.8.0
blaze-core 0.8.0 np19py27_0
blz 0.6.2 np19py27_1
bokeh 0.9.0 np19py27_0
boto 2.38.0 py27_0
bottleneck 1.0.0 np19py27_0
cdecimal 2.3 py27_1
certifi 14.05.14 py27_0
cffi 1.1.2 py27_0
characteristic 14.3.0
clyent 0.3.4 py27_0
colorama 0.3.3 py27_0
conda 3.16.0 py27_0
conda-build 1.14.0 py27_0
conda-env 2.4.2 py27_0
configobj 5.0.6 py27_0
crcmod 1.7
cryptography 0.9.3 py27_0
cssselect 0.9.1 py27_0
cython 0.22.1 py27_0
cytoolz 0.7.3 py27_0
datashape 0.4.5 np19py27_0
decorator 3.4.2 py27_0
docopt 0.6.2
docutils 0.12 py27_1
dynd-python 0.6.5 np19py27_0
enum34 1.0.4 py27_0
fastcache 1.0.2 py27_0
filechunkio 1.6
flask 0.10.1 py27_1
funcsigs 0.4 py27_0
futures 3.0.2 py27_0
gcs-oauth2-boto-plugin 1.9
gevent 1.0.1 py27_0
gevent-websocket 0.9.3 py27_0
google-api-python-client 1.4.0
google-apitools 0.4.3
greenlet 0.4.7 py27_0
grin 1.2.1 py27_2
gsutil 4.12
h5py 2.5.0 np19py27_1
hdf5 1.8.15.1 2
httplib2 0.9.1
idna 2.0 py27_0
ipaddress 1.0.7 py27_0
ipython 3.2.0 py27_0
ipython-notebook 3.2.0 py27_0
ipython-qtconsole 3.2.0 py27_0
itsdangerous 0.24 py27_0
jdcal 1.0 py27_0
jedi 0.8.1 py27_0
jinja2 2.7.3 py27_2
jsonschema 2.4.0 py27_0
launcher 1.0.0 1
llvmlite 0.5.0 py27_0
lxml 3.4.4 py27_0
markupsafe 0.23 py27_0
matplotlib 1.4.3 np19py27_1
menuinst 1.0.4 py27_0
mistune 0.5.1 py27_1
mock 1.0.1 py27_0
mrjob 0.4.4
multipledispatch 0.4.7 py27_0
networkx 1.9.1 py27_0
nltk 3.0.3 np19py27_0
node-webkit 0.10.1 0
nose 1.3.7 py27_0
numba 0.19.1 np19py27_0
numexpr 2.4.3 np19py27_0
numpy 1.9.2 py27_0
oauth2client 1.4.7
odo 0.3.2 np19py27_0
openpyxl 1.8.5 py27_0
pandas 0.16.2 np19py27_0
patsy 0.3.0 np19py27_0
pattern 2.6
pbs 0.110
pep8 1.6.2 py27_0
pillow 2.8.2 py27_0
pip 7.1.0 py27_1
ply 3.6 py27_0
protorpc 0.10.0
psutil 2.2.1 py27_0
py 1.4.27 py27_0
pyasn1 0.1.7 py27_0
pyasn1-modules 0.0.5
pycosat 0.6.1 py27_0
pycparser 2.14 py27_0
pycrypto 2.6.1 py27_3
pyflakes 0.9.2 py27_0
pygments 2.0.2 py27_0
pyopenssl 0.15.1 py27_1
pyparsing 2.0.3 py27_0
pyqt 4.10.4 py27_1
pyreadline 2.0 py27_0
pytables 3.2.0 np19py27_0
pytest 2.7.1 py27_0
python 2.7.9 1
python-dateutil 2.4.2 py27_0
python-gflags 2.0
pytz 2015.4 py27_0
pywin32 219 py27_0
pyyaml 3.11 py27_1
pyzmq 14.7.0 py27_0
queuelib 1.2.2 py27_0
requests 2.7.0 py27_0
retry-decorator 1.0.0
rodeo 0.2.3
rope 0.9.4 py27_1
rsa 3.1.4
runipy 0.1.3 py27_0
scikit-image 0.11.3 np19py27_0
scikit-learn 0.16.1 np19py27_0
scipy 0.15.1 np19py27_0
scrapy 1.0.3
seaborn 0.5.1 np19py27_0
service-identity 14.0.0
setuptools 18.1 py27_0
simplejson 3.6.5
six 1.9.0 py27_0
snowballstemmer 1.2.0 py27_0
sockjs-tornado 1.0.1 py27_0
socksipy-branch 1.1
sphinx 1.3.1 py27_0
sphinx-rtd-theme 0.1.7
sphinx_rtd_theme 0.1.7 py27_0
spyder 2.3.5.2 py27_0
spyder-app 2.3.5.2 py27_0
sqlalchemy 1.0.5 py27_0
ssl_match_hostname 3.4.0.2 py27_0
statsmodels 0.6.1 np19py27_0
sympy 0.7.6 py27_0
tables 3.2.0
toolz 0.7.2 py27_0
tornado 4.2 py27_0
twisted 15.3.0 py27_0
ujson 1.33 py27_0
unicodecsv 0.9.4 py27_0
uritemplate 0.6
w3lib 1.12.0 py27_0
werkzeug 0.10.4 py27_0
wheel 0.24.0 py27_0
xlrd 0.9.3 py27_0
xlsxwriter 0.7.3 py27_0
xlwings 0.3.5 py27_0
xlwt 1.0.0 py27_0
zlib 1.2.8 0
zope.interface 4.1.2 py27_1
回答1:
That particular error message is being generated by boto (boto 2.38.0 py27_0), which is used to connect to Amazon S3. Scrapy doesn't have this enabled by default.
If you're just going through the tutorial, and haven't done anything other than what you've been instructed to do, then it could be a configuration problem. Launching Scrapy with the shell argument from the command will still use the configuration and the associated settings file. By default, Scrapy will look in:
/etc/scrapy.cfgorc:\scrapy\scrapy.cfg(system-wide),~/.config/scrapy.cfg($XDG_CONFIG_HOME) and~/.scrapy.cfg($HOME) for global (user-wide) settings, andscrapy.cfginside a scrapy project’s root (see next section).
EDIT:
In reply to the comments, this appears to be a bug with Scrapy when boto is present (bug here).
In response "how to disable the Download handler", add the following to your settings.py file:
DOWNLOAD_HANDLERS : {
's3': None,
}
Your settings.py file should be in the root of your Scrapy project folder, (one level deeper than your scrapy.cfg file).
If you've already got DOWNLOAD_HANDLERS in your settings.py file, just add a new entry for 's3' with a None value.
EDIT 2:
I'd highly recommend looking at setting up virtual environments for your projects. Look into virtualenv, and it's usage. I'd make this recommendation regardless of packages used for this project, but doubly so with your extreme number of packages.
回答2:
Maybe you should use double-quote (") instead of single-quote (').
My Python version is 2.7.10 on win32. Scrapy version is 1.0.3.
来源:https://stackoverflow.com/questions/32132482/scrapy-shell-error