I am using Python
to scrape pages. Until now I didn\'t have any complicated issues.
The site that I\'m trying to scrape uses a lot of security checks an
I had this problem too. I used urllib
with tor
in python3
.
open terminal and type:
curl --socks5-hostname localhost:9050
if you see result it's worked.
import socks
import socket
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
#set socks5 proxy to use tor
socks.set_default_proxy(socks.SOCKS5, "localhost", 9050)
socket.socket = socks.socksocket
req = Request('http://check.torproject.org', headers={'User-Agent': 'Mozilla/5.0', })
html = urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
print(soup('title')[0].get_text())
if you see
Congratulations. This browser is configured to use Tor.
it worked in python too and this means you are using tor for web scraping.