How to crawl images in Nutch?

▼魔方 西西 提交于 2019-12-11 00:05:17

问题


How to crawl images in Nutch? Or, is there any other open search engine which is producing the results with images?


回答1:


change your regex-urlfilter.txt in conf

-.(ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|exe|EXE|js|JS|gif|GIF|png|PNG||jpg|JPG|jpeg|JPEG|bmp|BMP|mpg|MPG|mov|MOV)$

Delete jpeg, jpg, gif or type picture that you want to grep.

And then change suffix-urlfilter.txt in conf

add # to jpeg, gif or png

That worked for me!



来源:https://stackoverflow.com/questions/3247589/how-to-crawl-images-in-nutch

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!