How can I get href links from HTML using Python?

后端 未结 10 2306
自闭症患者
自闭症患者 2020-11-27 03:25
import urllib2

website = \"WEBSITE\"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

So far so good.

But I wa

10条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-27 04:02

    My answer probably sucks compared to the real gurus out there, but using some simple math, string slicing, find and urllib, this little script will create a list containing link elements. I test google and my output seems right. Hope it helps!

    import urllib
    test = urllib.urlopen("http://www.google.com").read()
    sane = 0
    needlestack = []
    while sane == 0:
      curpos = test.find("href")
      if curpos >= 0:
        testlen = len(test)
        test = test[curpos:testlen]
        curpos = test.find('"')
        testlen = len(test)
        test = test[curpos+1:testlen]
        curpos = test.find('"')
        needle = test[0:curpos]
        if needle.startswith("http" or "www"):
            needlestack.append(needle)
      else:
        sane = 1
    for item in needlestack:
      print item
    

提交回复
热议问题