正则表达式 | 易学教程

基础

需要谨记的3个字母

\d	匹配数字
\w	匹配字母、数字、下划线
\s	匹配空白符（例如空格、换行符、制表符等）

大写表示匹配其相反的东西，如：\D 匹配非数字字符，\S匹配非空白字符

常用的匹配字符

[0-9]	匹配在此区间的数字，同\d
[a-zA-Z]	匹配在此区间的字母，可以多个区间一起
[x.\|9\\]	匹配在中括号内的字符，里面的都是普通字符没其他含义，除斜杠比较特殊需转义
[^a8]	匹配除中括号内字符之外的字符，和上面3个字母大写匹配其相反的东西一样的道理
hello	匹配常规字符串
^hello	匹配行首的hello
hello$	匹配行末的hello
hello\|world	匹配hello或者world（正则中没有与运算）
.	匹配除换行符（\n、\r）之外的任意字符
[\u4e00-\u9fa5]	匹配中文字符
[\s\S]	匹配任意字符（随便两个互补区间都可以表示任意字符）

表示字符数量的描述符（描述前面子表达式的匹配数量）

*	零次或多次
+	一次或多次
?	零次或一次
{n}	整数n次
{n,m}	n~m次，包含n和m
{n,}	n次以上，包含n

默认是描述前面一个字符，如果是多个字符的子表达式，需要括号括起来。

如：(pattern)* th(is|at) 括号别有用处，后面会讲到

进阶

获取匹配与非获取匹配

(pattern)

这里用括号将匹配子表达式括起来，从匹配结果来看加括号不影响正常匹配。

加括号主要是为了获取该子表达式匹配到的内容

获取匹配，怎么用我们后面会讲到

(?:pattern)

如果我们加括号只是为了圈定一串子表达式，而不用获取匹配到的内容。

上面的例子我们最好这样 th(?:is|at)，等同于this|that

非获取匹配

hello(?=world)

如果一字符串包含多处hello，只匹配后面跟着world的hello，匹配结果不包含world

例如，"Windows(?=95|2000)"能匹配"Windows2000"中的"Windows"，但不能匹配"Windows98"中的"Windows"

非获取匹配

hello(?!world)

同理，只匹配后面没跟着world的hello，匹配结果不包含world

非获取匹配

(?<=hello)world

如果一字符串包含多处world，只匹配前面跟着hello的world，匹配结果不包含hello

例如，"(?<=95|2000)Windows"能匹配"2000Windows"中的"Windows"，但不能匹配"98Windows"中的"Windows"

非获取匹配

(?<!hello)world

同理，只匹配前面没跟着hello的world，匹配结果不包含hello

非获取匹配

非贪婪匹配

举个例子："hello world and hi world" 我们需要匹配其中的"hello world"和"hi world"

我们可以这样子h.*world，但是我们运行后发现匹配到的结果是"hello world and hi world"，这结果也没毛病，确实符合我们写出来的正则表达式。
如何得到我们想要的结果呢，我们这样子h.*?world，就能匹配到"hello world"和"hi world"。

贪婪匹配：尽可能匹配最长的字符串

非贪婪匹配：尽可能匹配最短的字符串

个人对非贪婪匹配的理解是：从左往右匹配到符合正则的最短结果，返回结果并继续往后匹配下一结果。

扩展（思考）

匹配type为password的input标签: <input type=radio name=xxx><input type=password name=yyy><input type=txt name=zzz>

<input.*password.*> 匹配得到 <input type=radio name=xxx><input type=password name=yyy><input type=txt name=zzz>

<input.*password.*?> 匹配得到 <input type=radio name=xxx><input type=password name=yyy><input type=txt name=zzz>

<input((?!input).)*password.*?>匹配得到<input type=radio name=xxx><input type=password name=yyy><input type=txt name=zzz>

第一步写出匹配的轮廓，第二步使用非贪婪匹配排除掉后面部分，第三步input与password之间的字符串排除input。

个人对((?!input).)*的理解：

(?!input)i 匹配所有位置的‘i’，但不包括input开头的‘i’，即后面跟着nput的i都不匹配。
(?!input). 匹配任意字符（点的作用），但不包括input开头的‘i’。
((?!input).)* 匹配零个或多个(2)中的字符，既然input开头的‘i’被排除了，那么也就不存在input字符串了。

应用

java演示group

获取匹配

public static void main(String[] args) {

String str = "Her name is John, Her age is 18";

str += "Her name is Mike, Her age is 15";

Pattern pattern = Pattern.compile("Her name is (\\w+), Her age is (\\d+)");

Matcher matcher = pattern.matcher(str);

matcher.groupCount(); //获取匹配数量

while (matcher.find()) {

System.out.println(matcher.group()); // 输出整条正则表达式匹配到的内容

System.out.println(matcher.group(1) + " " + matcher.group(2)); // 输出括号中子表达式匹配到的内容

}

结果：

Her name is John, Her age is 18

John 18

Her name is Mike, Her age is 15

Mike 15

给group取别名，通过别名获取匹配（不影响表达式的正则匹配）

Pattern pattern = Pattern.compile("Her name is (?<name>\\w+), Her age is (\\d+)");

matcher.group(name);

linux命令grep使用正则

python使用正则

练习/训练

http://tool.chinaz.com/regex/

来源：https://www.cnblogs.com/zhangzongjian/p/12657506.html

标签

正则表达式

字符