Python 正则表达式（Regular Expressions）

正则表达式（Regex）是一种强大的字符串匹配和处理工具，Python 提供了 re 模块来处理正则表达式，广泛用于数据清理、文本解析、模式匹配等任务。

1. `re` 模块

常用函数

函数	作用
`re.match(pattern, string)`	从字符串开头匹配
`re.search(pattern, string)`	搜索整个字符串，返回第一个匹配
`re.findall(pattern, string)`	返回所有匹配项的列表
`re.finditer(pattern, string)`	返回所有匹配项的迭代器
`re.sub(pattern, replacement, string)`	替换匹配的内容
`re.split(pattern, string)`	按模式拆分字符串
`re.compile(pattern)`	预编译正则，提高匹配效率

示例

1) `re.match()` - 从字符串开头匹配

import re

pattern = r"\d+"  # 匹配一个或多个数字
result = re.match(pattern, "123abc")
print(result.group())  # 输出: 123

result = re.match(pattern, "abc123")
print(result)  # 输出: None (因为匹配必须从字符串开头开始)

2) `re.search()` - 搜索整个字符串

import re

pattern = r"\d+"
result = re.search(pattern, "abc123xyz456")
print(result.group())  # 输出: 123

re.search() 只返回第一个匹配项，而 re.findall() 返回所有匹配项。

3) `re.findall()` - 查找所有匹配项

import re

pattern = r"\d+"
result = re.findall(pattern, "abc123xyz456")
print(result)  # 输出: ['123', '456']

4) `re.finditer()` - 迭代查找

import re

pattern = r"\d+"
matches = re.finditer(pattern, "abc123xyz456")

for match in matches:
    print(match.group())  # 输出: 123 456

5) `re.sub()` - 字符串替换

import re

pattern = r"\d+"
replacement = "XXX"
text = "Phone: 123-456-7890"
result = re.sub(pattern, replacement, text)
print(result)  # 输出: Phone: XXX-XXX-XXX

6) `re.split()` - 按模式拆分字符串

import re

pattern = r"\s+"  # 匹配空格
text = "Hello   world Python Regex"
result = re.split(pattern, text)
print(result)  # 输出: ['Hello', 'world', 'Python', 'Regex']

7) `re.compile()` - 预编译正则表达式

import re

pattern = re.compile(r"\d+")
print(pattern.findall("Order 42, price 100"))  # 输出: ['42', '100']

re.compile() 适用于多次使用相同正则表达式的情况，可以提高效率。

2. 常见正则匹配语法

1) 元字符

符号	描述	示例	匹配结果
`.`	任意字符（除换行符）	`c.t`	`cat`, `cut`
`^`	匹配字符串开头	`^Hello`	`Hello world`
`$`	匹配字符串结尾	`world$`	`Hello world`
`*`	匹配0 或更多次	`ab*`	`a`, `ab`, `abb`
`+`	匹配1 或更多次	`ab+`	`ab`, `abb`
`?`	匹配0 或 1 次	`ab?`	`a`, `ab`
`{n}`	匹配n 次	`a{3}`	`aaa`
`{n,}`	至少匹配 n 次	`a{2,}`	`aa`, `aaa`, `aaaa`
`{n,m}`	至少 n 次，至多 m 次	`a{2,4}`	`aa`, `aaa`, `aaaa`

2) 字符集

表达式	描述	示例	匹配
`[abc]`	匹配 `a`、`b` 或 `c`	`b`	`b`
`[^abc]`	不匹配 `a`、`b` 或 `c`	`d`	`d`
`[a-z]`	匹配小写字母	`e`	`e`
`[0-9]`	匹配数字	`5`	`5`

3) 预定义字符

表达式	等价于	匹配内容
`\d`	`[0-9]`	数字
`\D`	`[^0-9]`	非数字
`\s`	`[ \t\n\r\f\v]`	空白字符
`\S`	`[^ \t\n\r\f\v]`	非空白字符
`\w`	`[a-zA-Z0-9_]`	单词字符（字母、数字、下划线）
`\W`	`[^a-zA-Z0-9_]`	非单词字符

4) 分组和引用

表达式	描述	示例	匹配
`(abc)`	捕获组	`(ab)+`	`abab`
`\1`	反向引用	`(ab)\1`	`abab`
`(?:abc)`	非捕获组	`(?:ab)+`	`abab`

示例：分组匹配

import re

pattern = r"(\d{3})-(\d{3})-(\d{4})"
text = "Phone: 123-456-7890"
match = re.search(pattern, text)

if match:
    print(match.group())  # 输出: 123-456-7890
    print(match.group(1))  # 输出: 123
    print(match.group(2))  # 输出: 456
    print(match.group(3))  # 输出: 7890

总结

功能	常用方法	示例
匹配字符串	`re.match()`, `re.search()`	`re.search(r"\d+", "abc123")`
查找所有匹配项	`re.findall()`	`re.findall(r"\d+", "abc123xyz456")`
替换字符串	`re.sub()`	`re.sub(r"\d+", "X", "Phone: 123-456")`
拆分字符串	`re.split()`	`re.split(r"\s+", "Hello world")`
预编译	`re.compile()`	`pattern = re.compile(r"\d+")`

Python 的 re 模块提供了强大的正则表达式功能，可以帮助我们高效地处理文本数据。🚀

1. re 模块​

常用函数​

示例​

1) re.match() - 从字符串开头匹配​

2) re.search() - 搜索整个字符串​

3) re.findall() - 查找所有匹配项​

4) re.finditer() - 迭代查找​

5) re.sub() - 字符串替换​

6) re.split() - 按模式拆分字符串​

7) re.compile() - 预编译正则表达式​

2. 常见正则匹配语法​

1) 元字符​

2) 字符集​

3) 预定义字符​

4) 分组和引用​

示例：分组匹配​

总结​

1. `re` 模块

常用函数

示例

1) `re.match()` - 从字符串开头匹配

2) `re.search()` - 搜索整个字符串

3) `re.findall()` - 查找所有匹配项

4) `re.finditer()` - 迭代查找

5) `re.sub()` - 字符串替换

6) `re.split()` - 按模式拆分字符串

7) `re.compile()` - 预编译正则表达式

2. 常见正则匹配语法

1) 元字符

2) 字符集

3) 预定义字符

4) 分组和引用

示例：分组匹配

总结