Cant Scrape webpage with Python Requests Library

后端未结

关注

 2  1492

一向 2020-12-11 10:20

I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn\'t seem to exist when I connect via

2条回答

死守一世寂寞 (楼主)

2020-12-11 10:54

here is the code, how i scrap a table from one site. in that site, they didn't define id or class in table so you no need to put anything. if id or class there means just use html.xpath('//table[@id=id_val]/tr') instead of html.xpath('//table/tr')

from lxml import etree
import urllib
web = urllib.urlopen("http://www.yourpage.com/")
html = etree.HTML(web.read())
tr_nodes = html.xpath('//table/tr')
td_content = [tr.xpath('td') for tr in tr_nodes  if [td.text for td in tr.xpath('td')][2] == 'Chennai' or [td.text for td in tr.xpath('td')][2] == 'Across India'  or 'Chennai' in [td.text for td in tr.xpath('td')][2].split('/') ]
main_list = []
for i in td_content:
    if i[5].text == 'Freshers' or  'Freshers' in i[5].text.split('/') or  '0' in i[5].text.split(' '):
       sub_list = [td.text for td in i]
       sub_list.insert(6,'http://yourpage.com/%s'%i[6].xpath('a')[0].get('href'))
       main_list.append(sub_list)
print 'main_list',main_list

0 讨论(0)

查看其它2个回答