问题
I am trying to match a URL with 6 or more than 6 levels or sub-paths
http://www.domain.com/level1/level2/level3/level4/level5/level6/level7/level8/level9/level10/level11/level12.html
I came up with an expression
^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b)
...which matches level1 (demo)
However, when I am trying to match a URL with six or more levels it doesn't seem to work.
^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b){6,}
(demo)
回答1:
I think this is what you were trying for:
^http://([a-zA-Z.-]+)/(?:[^/]+/){6,}.*$
This matches six or more levels, which is what you said you wanted in the question. However in the question's title you phrased it "more than six". If that's what you really want, change the quantifier from {6,}
to {7,}
.
On a side note, the forward slash (/
) has no special meaning in regexes, and doesn't need to be escaped. Rubular forces you to escape the slash because that's what it uses as the regex delimiter. Nutch uses Java's built-in regexes, so you should use a tester that the same flavor, like this one.
回答2:
Try the following:
^http:\/\/([a-zA-Z\.-]*)(\/[\w\.]+){6,}
http://rubular.com/r/QZlidUqheq
来源:https://stackoverflow.com/questions/15505406/regular-expression-to-match-a-url-with-6-or-more-levels