BeautifulSoup - modifying all links in a piece of HTML?

后端 未结 3 2176
春和景丽
春和景丽 2020-12-01 12:03

I need to be able to modify every single link in an HTML document. I know that I need to use the SoupStrainer but I\'m not 100% positive on how to implement it.

相关标签:
3条回答
  • 2020-12-01 12:58

    I tried this and it worked, it's easier to avoid using regexp for matching each 'href':

    from bs4 import BeautifulSoup as bs
    soup = bs(htmltext)
    for a in soup.findAll('a'):
        a['href'] = "mysite"
    

    Check it out, on bs4 docs.

    0 讨论(0)
  • 2020-12-01 13:02

    Maybe something like this would work? (I don't have a Python interpreter in front of me, unfortunately)

    from BeautifulSoup import BeautifulSoup
    soup = BeautifulSoup('<p>Blah blah blah <a href="http://google.com">Google</a></p>')
    for a in soup.findAll('a'):
      a['href'] = a['href'].replace("google", "mysite")
    
    result = str(soup)
    
    0 讨论(0)
  • 2020-12-01 13:03
    from BeautifulSoup import BeautifulSoup
    soup = BeautifulSoup('<p>Blah blah blah <a href="http://google.com">Google</a></p>')
    for a in soup.findAll('a'):
        a['href'] = a['href'].replace("google", "mysite")
    print str(soup)
    

    This is Lusid's solution, but since he didn't have a Python interpreter in front of him, he wasn't able to test it and it had a few errors. I just wanted to post the working condition. Thank's Lusid!

    0 讨论(0)
提交回复
热议问题