Scraping in Python - Preventing IP ban

前端未结

关注

 3  883

长情又很酷 2020-12-22 22:18

I am using Python to scrape pages. Until now I didn\'t have any complicated issues.

The site that I\'m trying to scrape uses a lot of security checks an

3条回答

野趣味 (楼主)

2020-12-22 22:51

I had this problem too. I used urllib with tor in python3.

download and install tor browser
testing tor

open terminal and type:

curl --socks5-hostname localhost:9050

if you see result it's worked.

Now we should test in python. Now run this code

import socks
import socket
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

#set socks5 proxy to use tor

socks.set_default_proxy(socks.SOCKS5, "localhost", 9050)
socket.socket = socks.socksocket
req = Request('http://check.torproject.org', headers={'User-Agent': 'Mozilla/5.0', })
html = urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
print(soup('title')[0].get_text())

if you see

Congratulations. This browser is configured to use Tor.

it worked in python too and this means you are using tor for web scraping.

0 讨论(0)

查看其它3个回答