Urllib2 & BeautifulSoup : Nice couple but too slow - urllib3 & threads?

前端未结

关注

 3  577

醉酒成梦 2021-01-31 06:38

I was looking to find a way to optimize my code when I heard some good things about threads and urllib3. Apparently, people disagree which solution is the best.

The pro

3条回答

天命终不由人 (楼主)

2021-01-31 07:27
I don't think urllib or BeautifulSoup is slow. I run your code in my local machine with a modified version ( removed the excel stuff ). It took around 100ms to open the connection, download the content, parse it , and print it to the console for a country.

10ms is the total amount of time that BeautifulSoup spent to parse the content, and print to the console per country. That is fast enough.

Neither I do believe using Scrappy or Threading is going to solve the problem. Because the problem is the expectation that it is going to be fast.

Welcome to the world of HTTP. It is going to be slow sometimes, sometimes it will be very fast. Couple of slow connection reasons
- because of the server handling your request( return 404 sometimes )
- DNS resolve ,
- HTTP handshake,
- your ISP's connection stability,
- your bandwidth rate,
- packet loss rate
etc..

Don't forget, you are trying to make 121 HTTP Requests to a server consequently and you don't know what kind of servers do they have. They might also ban your IP address because of consequent calls.

Take a look at Requests lib. Read their documentation. If you're doing this to learn Python more, don't jump into Scrapy directly.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...