Python Web Crawlers and “getting” html source code

前端 未结 4 1105
不知归路
不知归路 2020-12-24 13:53

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I\'m using version 2.7 and reading the python library, but I

4条回答
  •  独厮守ぢ
    2020-12-24 14:28

    Use Python 2.7, is has more 3rd party libs at the moment. (Edit: see below).

    I recommend you using the stdlib module urllib2, it will allow you to comfortably get web resources. Example:

    import urllib2
    
    response = urllib2.urlopen("http://google.de")
    page_source = response.read()
    

    For parsing the code, have a look at BeautifulSoup.

    BTW: what exactly do you want to do:

    Just for background, I need to download a page and replace any img with ones I have

    Edit: It's 2014 now, most of the important libraries have been ported, and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

提交回复
热议问题