I\'m trying to extract URLs from a sitemap like this: https://www.bestbuy.com/sitemap_c_0.xml.gz
I\'ve unzipped and saved the .xml.gz file as an .xml file. The struc
I know this is a bit of a zombie reply, but I actually just posted a tool on github that does exactly what you're looking for. And in Python! So feel free to take what you need from the source code (or use it as-is). I figured I'd comment with this so other people who come across this thread would have it.
Here it is: https://github.com/tcaldron/xmlscrape