可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

So I'm struggling to implement beautiful into my current python project, Okay so to keep this plain and simple I'll reduce the complexity of my current script.

Script without BeautifulSoup -

import urllib2      def check(self, name, proxy):         urllib2.install_opener(             urllib2.build_opener(                 urllib2.ProxyHandler({'http': 'http://%s' % proxy}),                 urllib2.HTTPHandler()                 )             )          req = urllib2.Request('http://example.com' ,"param=1")         try:             resp = urllib2.urlopen(req)          except:             self.insert()         try:             if 'example text' in resp.read()                print 'success'

now of course the indentation is wrong, this is just sketch up of what I have going on, as you can in simple terms I'm sending a post request to " example.com " & then if example.com contains " example text " in resp.read print success.

But what I actually want is to check

if ' example ' in resp.read()

then output text inside td align from example.com request using

soup.find_all('td', {'align':'right'})[4]

Now the way I'm implementing beautifulsoup isn't working, example of this -

import urllib2 from bs4 import BeautifulSoup as soup  main_div = soup.find_all('td', {'align':'right'})[4]      def check(self, name, proxy):         urllib2.install_opener(             urllib2.build_opener(                 urllib2.ProxyHandler({'http': 'http://%s' % proxy}),                 urllib2.HTTPHandler()                 )             )          req = urllib2.Request('http://example.com' ,"param=1")         try:             resp = urllib2.urlopen(req)              web_soup = soup(urllib2.urlopen(req), 'html.parser')         except:             self.insert()         try:             if 'example text' in resp.read()                print 'success' + main_div

Now you see I added 4 new lines/adjustments

from bs4 import BeautifulSoup as soup  web_soup = soup(urllib2.urlopen(url), 'html.parser')  main_div = soup.find_all('td', {'align':'right'})[4]  aswell as " + main_div " on print

However it just doesn't seem to be working, I've had a few errors whilst adjusting some of which have said " Local variable referenced before assignment " & " unbound method find_all must be called with beautifulsoup instance as first argument "

回答1:

Regarding your last code snippet:

from bs4 import BeautifulSoup as soup  web_soup = soup(urllib2.urlopen(url), 'html.parser') main_div = soup.find_all('td', {'align':'right'})[4]

You should call find_all on the web_soup instance. Also be sure to define the url variable before you use it:

from bs4 import BeautifulSoup as soup  url = "url to be opened" web_soup = soup(urllib2.urlopen(url), 'html.parser') main_div = web_soup.find_all('td', {'align':'right'})[4]

文章来源: Python urllib2 + Beautifulsoup

标签

urllib2

url

python