Python urllib2 + Beautifulsoup

匿名 (未验证) 提交于 2019-12-03 01:40:02

问题:

So I'm struggling to implement beautiful into my current python project, Okay so to keep this plain and simple I'll reduce the complexity of my current script.

Script without BeautifulSoup -

import urllib2      def check(self, name, proxy):         urllib2.install_opener(             urllib2.build_opener(                 urllib2.ProxyHandler({'http': 'http://%s' % proxy}),                 urllib2.HTTPHandler()                 )             )          req = urllib2.Request('http://example.com' ,"param=1")         try:             resp = urllib2.urlopen(req)          except:             self.insert()         try:             if 'example text' in resp.read()                print 'success' 

now of course the indentation is wrong, this is just sketch up of what I have going on, as you can in simple terms I'm sending a post request to " example.com " & then if example.com contains " example text " in resp.read print success.

But what I actually want is to check

if ' example ' in resp.read() 

then output text inside td align from example.com request using

soup.find_all('td', {'align':'right'})[4] 

Now the way I'm implementing beautifulsoup isn't working, example of this -

import urllib2 from bs4 import BeautifulSoup as soup  main_div = soup.find_all('td', {'align':'right'})[4]      def check(self, name, proxy):         urllib2.install_opener(             urllib2.build_opener(                 urllib2.ProxyHandler({'http': 'http://%s' % proxy}),                 urllib2.HTTPHandler()                 )             )          req = urllib2.Request('http://example.com' ,"param=1")         try:             resp = urllib2.urlopen(req)              web_soup = soup(urllib2.urlopen(req), 'html.parser')         except:             self.insert()         try:             if 'example text' in resp.read()                print 'success' + main_div 

Now you see I added 4 new lines/adjustments

from bs4 import BeautifulSoup as soup  web_soup = soup(urllib2.urlopen(url), 'html.parser')  main_div = soup.find_all('td', {'align':'right'})[4]  aswell as " + main_div " on print 

However it just doesn't seem to be working, I've had a few errors whilst adjusting some of which have said " Local variable referenced before assignment " & " unbound method find_all must be called with beautifulsoup instance as first argument "

回答1:

Regarding your last code snippet:

from bs4 import BeautifulSoup as soup  web_soup = soup(urllib2.urlopen(url), 'html.parser') main_div = soup.find_all('td', {'align':'right'})[4] 

You should call find_all on the web_soup instance. Also be sure to define the url variable before you use it:

from bs4 import BeautifulSoup as soup  url = "url to be opened" web_soup = soup(urllib2.urlopen(url), 'html.parser') main_div = web_soup.find_all('td', {'align':'right'})[4] 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!