Using SoupStrainer to parse selectively

被刻印的时光 ゝ 提交于 2019-12-04 09:38:59

Oh boy am i silly, i was searching for tags with atribute id = products, but it should have been product_list

heres the finaly code if anyone comes searching.

from BeautifulSoup import BeautifulSoup, SoupStrainer
import urllib
import re


start = time.clock()
url = "http://someplace.com"
html = urllib.urlopen(url).read()
product = SoupStrainer('div',{'id': 'products_list'})
soup = BeautifulSoup(html,parseOnlyThese=product)
for a in soup.findAll('a',{'title':re.compile('.+') }):
      print a.string

Try searching first for the product list div and then for the a tags with title:

product = soup.find('div',{'id': 'products'})
for a in product.findAll('a',{'title': re.compile('.+') }):
   print a.string
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!