I\'m looking for a way to pseudo-spider a website. The key is that I don\'t actually want the content, but rather a simple list of URIs. I can get reasonably close to this i
I've used a tool called xidel
xidel http://server -e '//a/@href' | grep -v "http" | sort -u | xargs -L1 -I {} xidel http://server/{} -e '//a/@href' | grep -v "http" | sort -u
A little hackish but gets you closer! This is only the first level. Imagine packing this up into a self recursive script!