Python Selenium Page Source

匿名 (未验证) 提交于 2019-12-03 01:06:02

问题:

I want to get all the IP proxy address from: https://free-proxy-list.net/

I decided that it will be faster if I get it from source code.

But problem is that I see everything when I click CTRL+U, but when I'm using "page_source" I see only few ip instead of all.

Thanks for help. For DebanjanB I show code. I didn't have to use selenium.

There is code:

import requests import lxml.html r = requests.get("https://free-proxy-list.net/") html = lxml.html.fromstring(r.content) ip_list = html.xpath("//tr/td[1]/text()") port_list = html.xpath("//tr/td[2]/text()") with open("E:\proxy_lista.csv",'w',newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|',                         quoting=csv.QUOTE_MINIMAL) for i in range(0,len(ip_list)): spamwriter.writerow(ip_list[i].split()) csvfile.close() 

回答1:

This is because only 20 table rows currently displayed on page.

If you need just to scrape those IP numbers, you might need to use python-requests + lxml.html instead of selenium:

import requests import lxml.html  r = requests.get("https://free-proxy-list.net/") html = lxml.html.fromstring(r.content) ip_list = html.xpath("//tr/td[1]/text()") 

If it's mandatory for you to use selenium you should create an empty list, append() required values and click() "Next" button. Do this in a while loop until "Next" button is enabled



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!