Python Web Crawlers and “getting” html source code

前端未结

关注

 4  1111

不知归路 2020-12-24 13:53

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I\'m using version 2.7 and reading the python library, but I

4条回答

独厮守ぢ (楼主)

2020-12-24 14:28
~~Use Python 2.7, is has more 3rd party libs at the moment.~~ (Edit: see below).

I recommend you using the stdlib module urllib2, it will allow you to comfortably get web resources. Example:
```
import urllib2

response = urllib2.urlopen("http://google.de")
page_source = response.read()
```
For parsing the code, have a look at BeautifulSoup.

BTW: what exactly do you want to do:

Just for background, I need to download a page and replace any img with ones I have

Edit: It's 2014 now, most of the important libraries have been ported, and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...