Spider a Website and Return URLs Only

后端 未结 3 1508
遥遥无期
遥遥无期 2020-11-29 16:31

I\'m looking for a way to pseudo-spider a website. The key is that I don\'t actually want the content, but rather a simple list of URIs. I can get reasonably close to this i

3条回答
  •  渐次进展
    2020-11-29 17:10

    I've used a tool called xidel

    xidel http://server -e '//a/@href' | 
    grep -v "http" | 
    sort -u | 
    xargs -L1 -I {}  xidel http://server/{} -e '//a/@href' | 
    grep -v "http" | sort -u
    

    A little hackish but gets you closer! This is only the first level. Imagine packing this up into a self recursive script!

提交回复
热议问题