Cant Scrape webpage with Python Requests Library

后端 未结 2 1492
一向
一向 2020-12-11 10:20

I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn\'t seem to exist when I connect via

2条回答
  •  死守一世寂寞
    2020-12-11 10:54

    here is the code, how i scrap a table from one site. in that site, they didn't define id or class in table so you no need to put anything. if id or class there means just use html.xpath('//table[@id=id_val]/tr') instead of html.xpath('//table/tr')

    from lxml import etree
    import urllib
    web = urllib.urlopen("http://www.yourpage.com/")
    html = etree.HTML(web.read())
    tr_nodes = html.xpath('//table/tr')
    td_content = [tr.xpath('td') for tr in tr_nodes  if [td.text for td in tr.xpath('td')][2] == 'Chennai' or [td.text for td in tr.xpath('td')][2] == 'Across India'  or 'Chennai' in [td.text for td in tr.xpath('td')][2].split('/') ]
    main_list = []
    for i in td_content:
        if i[5].text == 'Freshers' or  'Freshers' in i[5].text.split('/') or  '0' in i[5].text.split(' '):
           sub_list = [td.text for td in i]
           sub_list.insert(6,'http://yourpage.com/%s'%i[6].xpath('a')[0].get('href'))
           main_list.append(sub_list)
    print 'main_list',main_list
    

提交回复
热议问题