Nutch regex-urlfilter syntax

浪尽此生 提交于 2019-11-29 14:52:50
xhudik

According to http://wiki.apache.org/nutch/FAQ#What_happens_if_I_inject_urls_several_times.3F you can't have multiple URLs (they will be ignored). What about to put only:

+^http://www.example.com/foo.cfm/(.+)*$

which should cover your first line: +^http://www.example.com/foo.cfm$ as well, or, if there are problems with /, try:

+^http://www.example.com/foo.cfm//?(.+)*$

Where //? should stand for character / or

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!