I am looking for a way to replace the SRC attribute in all IMG tags not using Regular expressions. (Would like to use any out-of-the box HTML parser included with default Py
Here is a pyparsing approach to your problem. You'll need to do your own code to transform the http src attribute.
from pyparsing import *
import urllib2
imgtag = makeHTMLTags("img")[0]
page = urllib2.urlopen("http://www.yahoo.com")
html = page.read()
page.close()
# print html
def modifySrcRef(tokens):
ret = "
"
imgtag.setParseAction(modifySrcRef)
print imgtag.transformString(html)
The tags convert to: