Replace SRC of all IMG elements using Parser

后端 未结 2 917
伪装坚强ぢ
伪装坚强ぢ 2020-12-06 12:41

I am looking for a way to replace the SRC attribute in all IMG tags not using Regular expressions. (Would like to use any out-of-the box HTML parser included with default Py

2条回答
  •  爱一瞬间的悲伤
    2020-12-06 13:24

    Here is a pyparsing approach to your problem. You'll need to do your own code to transform the http src attribute.

    from pyparsing import *
    import urllib2
    
    imgtag = makeHTMLTags("img")[0]
    
    page = urllib2.urlopen("http://www.yahoo.com")
    html = page.read()
    page.close()
    
    # print html
    
    def modifySrcRef(tokens):
        ret = ""
    
    imgtag.setParseAction(modifySrcRef)
    
    print imgtag.transformString(html)
    

    The tags convert to:

    Yahoo!
    All Yahoo! Services
    

提交回复
热议问题