Nutch 1.2 - Why won't nutch crawl url with query strings?
问题 I'm new to Nutch and not really sure what is going on here. I run nutch and it crawl my website, but it seems to ignore URLs that contain query strings. I've commented out the filter in the crawl-urlfilter.txt page so it look like this now: # skip urls with these characters #-[] #skip urls with slash delimited segment that repeats 3+ times #-.*(/[^/]+)/[^/]+\1/[^/]+\1/ So, i think i've effectively removed any filter so I'm telling nutch to accept all urls it finds on my website. Does anyone