Web crawling and robots.txt
问题 I used wget to 'download' a site. wget -r http://www.xyz.com i) It returns a .css file, a .js file, and index.php and an image img1.jpg ii) However, there exist more images under xyz.com . I typed www.xyz.com/Img2.jpg and hence got an image. iii) But index.php refers to a single image, i.e. img1.jpg . iv) A robot file accompanies it that contains Disallow: What change should be made in the command line to return everything under xyz.com , that are not referenced in index.php , but are static