Regular expression to match a URL with 6 or more levels

不想你离开。 提交于 2019-12-12 05:25:53

问题


I am trying to match a URL with 6 or more than 6 levels or sub-paths

http://www.domain.com/level1/level2/level3/level4/level5/level6/level7/level8/level9/level10/level11/level12.html

I came up with an expression

^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b) 

...which matches level1 (demo)

However, when I am trying to match a URL with six or more levels it doesn't seem to work.

^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b){6,}

(demo)


回答1:


I think this is what you were trying for:

^http://([a-zA-Z.-]+)/(?:[^/]+/){6,}.*$

This matches six or more levels, which is what you said you wanted in the question. However in the question's title you phrased it "more than six". If that's what you really want, change the quantifier from {6,} to {7,}.

On a side note, the forward slash (/) has no special meaning in regexes, and doesn't need to be escaped. Rubular forces you to escape the slash because that's what it uses as the regex delimiter. Nutch uses Java's built-in regexes, so you should use a tester that the same flavor, like this one.




回答2:


Try the following:

^http:\/\/([a-zA-Z\.-]*)(\/[\w\.]+){6,}

http://rubular.com/r/QZlidUqheq



来源:https://stackoverflow.com/questions/15505406/regular-expression-to-match-a-url-with-6-or-more-levels

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!