Can I use WGET to generate a sitemap of a website given its URL?

强颜欢笑 提交于 2019-12-03 12:55:51

问题


I need a script that can spider a website and return the list of all crawled pages in plain-text or similar format; which I will submit to search engines as sitemap. Can I use WGET to generate a sitemap of a website? Or is there a PHP script that can do the same?


回答1:


wget --spider --recursive --no-verbose --output-file=wgetlog.txt http://somewebsite.com
sed -n "s@.\+ URL:\([^ ]\+\) .\+@\1@p" wgetlog.txt | sed "s@&@\&@" > sedlog.txt

This creates a file called sedlog.txt that contains all links found on the specified website. You can use PHP or a shell script to convert the text file sitemap into an XML sitemap. Tweak the parameters of the wget command (accept/reject/include/exclude) to get only the links you need.




回答2:


You can use this perl script to do the trick : http://code.google.com/p/perlsitemapgenerator/



来源:https://stackoverflow.com/questions/3948947/can-i-use-wget-to-generate-a-sitemap-of-a-website-given-its-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!