當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

【362】python 正则表达式

發(fā)布時(shí)間：2023/11/29 python 29 豆豆

生活随笔收集整理的這篇文章主要介紹了【362】python 正则表达式小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

參考：正則表達(dá)式 - 廖雪峰

參考：Python3 正則表達(dá)式 - 菜鳥教程

參考：正則表達(dá)式 - 教程

re.match 嘗試從字符串的起始位置匹配一個(gè)模式，如果不是起始位置匹配成功的話，match()就返回none。

re.search 掃描整個(gè)字符串并返回第一個(gè)成功的匹配。

span()：返回搜索的索引區(qū)間
group()：返回匹配的結(jié)果

re.sub 用于替換字符串中的匹配項(xiàng)。

re.match只匹配字符串的開始，如果字符串開始不符合正則表達(dá)式，則匹配失敗，函數(shù)返回None；而re.search匹配整個(gè)字符串，直到找到一個(gè)匹配。

Python 的re模塊提供了re.sub用于替換字符串中的匹配項(xiàng)。

compile 函數(shù)用于編譯正則表達(dá)式，生成一個(gè)正則表達(dá)式（ Pattern ）對象，供 match() 和 search() 這兩個(gè)函數(shù)使用。

findall 在字符串中找到正則表達(dá)式所匹配的所有子串，并返回一個(gè)列表，如果沒有找到匹配的，則返回空列表。

注意：?match 和 search 是匹配一次 findall 匹配所有。

finditer 和 findall 類似，在字符串中找到正則表達(dá)式所匹配的所有子串，并把它們作為一個(gè)迭代器返回。

split 方法按照能夠匹配的子串將字符串分割后返回列表?

?\d?可以匹配一個(gè)數(shù)字；

\d?matches any digit, while?\D?matches any nondigit:

?\w?可以匹配一個(gè)字母或數(shù)字或者下劃線；

\w?matches any character that can be part of a word (Python identifier), that is, a letter, the underscore or a digit, while?\W?matches any other character:

?\W?可以匹配非數(shù)字字母下劃線；

?\s?表示一個(gè)空白格（也包括Tab、回車等空白格）；

\s?matches any space, while?\S?matches any nonspace character:

?.?表示任意字符；

?*?表示任意字符長度（包括0個(gè)）（>=0）；（其前面的一個(gè)字符，或者通過小括號(hào)匹配多個(gè)字符）

# 匹配最左邊，即是0個(gè)字符 >>> re.search('\d*', 'a123456b') <_sre.SRE_Match object; span=(0, 0), match=''># 匹配最長 >>> re.search('\d\d\d*', 'a123456b') <_sre.SRE_Match object; span=(1, 7), match='123456'>>>> re.search('\d\d*', 'a123456b') <_sre.SRE_Match object; span=(1, 7), match='123456'># 兩個(gè)的倍數(shù)匹配 >>> re.search('\d(\d\d)*', 'a123456b') <_sre.SRE_Match object; span=(1, 6), match='12345'>

?+?表示至少一個(gè)字符（>=1）；（其前面的一個(gè)字符，或者通過小括號(hào)匹配多個(gè)字符）

>>> re.search('.\d+', 'a123456b') <_sre.SRE_Match object; span=(0, 7), match='a123456'>>>> re.search('(.\d)+', 'a123456b') <_sre.SRE_Match object; span=(0, 6), match='a12345'>

???表示0個(gè)或1個(gè)字符；（其前面的一個(gè)字符，或者通過小括號(hào)匹配多個(gè)字符）

>>> re.search('\s(\d\d)?\s', 'a 12 b') <_sre.SRE_Match object; span=(1, 5), match=' 12 '>>>> re.search('\s(\d\d)?\s', 'a b') <_sre.SRE_Match object; span=(1, 3), match=' '>>>> re.search('\s(\d\d)?\s', 'a 1 b') # 無返回值，沒有匹配成功

[] 匹配，同時(shí)需要轉(zhuǎn)義的字符，在里面不需要，如 [.] 表示點(diǎn)

>>> re.search('[.]', 'abcabc.123456.defdef') <re.Match object; span=(6, 7), match='.'>>>> # 一次匹配中括號(hào)里面的任意字符 >>> re.search('[cba]+', 'abcabc.123456.defdef') <re.Match object; span=(0, 6), match='abcabc'>>>> re.search('.[\d]*', 'abcabc.123456.defdef') <re.Match object; span=(0, 1), match='a'>>>> re.search('\.[\d]*', 'abcabc.123456.defdef') <re.Match object; span=(6, 13), match='.123456'>>>> re.search('[.\d]+', 'abcabc.123456.defdef') <re.Match object; span=(6, 14), match='.123456.'>

?{n}?表示n個(gè)字符；

?{n,m}?表示n-m個(gè)字符；

?[0-9a-zA-Z\_]?可以匹配一個(gè)數(shù)字、字母或者下劃線；

?[0-9a-zA-Z\_]+?可以匹配至少由一個(gè)數(shù)字、字母或者下劃線組成的字符串，比如'a100'，'0_Z'，'Py3000'等等；

?[a-zA-Z\_][0-9a-zA-Z\_]*?可以匹配由字母或下劃線開頭，后接任意個(gè)由一個(gè)數(shù)字、字母或者下劃線組成的字符串，也就是Python合法的變量；

?[a-zA-Z\_][0-9a-zA-Z\_]{0, 19}?更精確地限制了變量的長度是1-20個(gè)字符（前面1個(gè)字符+后面最多19個(gè)字符）。

- 在 [] 中表示范圍，如果橫線挨著中括號(hào)則被視為真正的橫線
Ranges of letters or digits can be provided within square brackets, letting a hyphen separate the first and last characters in the range. A hyphen placed after the opening square bracket or before the closing square bracket is interpreted as a literal character:

>>> re.search('[e-h]+', 'ahgfea') <re.Match object; span=(1, 5), match='hgfe'>>>> re.search('[B-D]+', 'ABCBDA') <re.Match object; span=(1, 5), match='BCBD'>>>> re.search('[4-7]+', '154465571') <re.Match object; span=(1, 8), match='5446557'>>>> re.search('[-e-gb]+', 'a--bg--fbe--z') <re.Match object; span=(1, 12), match='--bg--fbe--'>>>> re.search('[73-5-]+', '14-34-576') <re.Match object; span=(1, 8), match='4-34-57'>

^ 在 [] 中表示后面字符除外的其他字符

Within a square bracket, a caret after placed after the opening square bracket excludes the characters that follow within the brackets:

>>> re.search('[^4-60]+', '0172853') <re.Match object; span=(1, 5), match='1728'>>>> re.search('[^-u-w]+', '-stv') <re.Match object; span=(1, 3), match='st'>

?A|B?可以匹配A或B，所以(P|p)ython可以匹配'Python'或者'python'。

Whereas square brackets surround alternative characters, a vertical bar separates alternative patterns:

>>> re.search('two|three|four', 'one three two') <re.Match object; span=(4, 9), match='three'>>>> re.search('|two|three|four', 'one three two') <re.Match object; span=(0, 0), match=''>>>> re.search('[1-3]+|[4-6]+', '01234567') <re.Match object; span=(1, 4), match='123'>>>> re.search('([1-3]|[4-6])+', '01234567') <re.Match object; span=(1, 7), match='123456'>>>> re.search('_\d+|[a-z]+_', '_abc_def_234_') <re.Match object; span=(1, 5), match='abc_'>>>> re.search('_(\d+|[a-z]+)_', '_abc_def_234_') <re.Match object; span=(0, 5), match='_abc_'>

?^?表示行的開頭，^\d表示必須以數(shù)字開頭。

?$?表示行的結(jié)束，\d$表示必須以數(shù)字結(jié)束。

A caret at the beginning of the pattern string matches the beginning of the data string; a dollar at the end of the pattern string matches the end of the data string:

>>> re.search('\d*', 'abc') <re.Match object; span=(0, 0), match=''>>>> re.search('^\d*', 'abc') <re.Match object; span=(0, 0), match=''>>>> re.search('\d*$', 'abc') <re.Match object; span=(3, 3), match=''>>>> re.search('^\d*$', 'abc')>>> re.search('^\s*\d*\s*$', ' 345 ') <re.Match object; span=(0, 5), match=' 345 '>

如果不在最前或最后，可以視為普通字符，但是在最前最后的時(shí)候想變成普通字符需要加上反斜杠

Escaping a dollar at the end of the pattern string, escaping a caret at the beginning of the pattern string or after the opening square bracket of a character class, makes dollar and caret lose the special meaning they have in those contexts context and let them be treated as literal characters:

>>> re.search('\$', '$*') <re.Match object; span=(0, 1), match='$'>>>> re.search('\^', '*^') <re.Match object; span=(1, 2), match='^'>>>> re.search('[\^]', '^*') <re.Match object; span=(0, 1), match='^'>>>> re.search('[^^]', '^*') <re.Match object; span=(1, 2), match='*'>

?^(\d{3})-(\d{3,8})$?分別定義了兩個(gè)組，可以直接從匹配的字符串中提取出區(qū)號(hào)和本地號(hào)碼：

group(0)：永遠(yuǎn)是原始字符串；
group(1)：表示第1個(gè)子串；
group(2)：表示第2個(gè)子串，以此類推。

分組順序：按照左括號(hào)的順序開始

Parentheses allow matched parts to be saved. The object returned by?re.search()?has a?group()?method that without argument, returns the whole match and with arguments, returns partial matches; it also has a?groups()method that returns all partial matches:

>>> R = re.search('((\d+) ((\d+) \d+)) (\d+ (\d+))',' 1 23 456 78 9 0 ')>>> R <re.Match object; span=(2, 15), match='1 23 456 78 9'>>>> R.group() '1 23 456 78 9'>>> R.groups() ('1 23 456', '1', '23 456', '23', '78 9', '9')>>> [R.group(i) for i in range(len(R.groups()) + 1)] ['1 23 456 78 9', '1 23 456', '1', '23 456', '23', '78 9', '9']

?: 二選一，括號(hào)不計(jì)入分組

>>> R = re.search('([+-]?(?:0|[1-9]\d*)).*([+-]?(?:0|[1-9]\d*))',' a = -3014, b = 0 ')>>> R <re.Match object; span=(5, 17), match='-3014, b = 0'>>>> R.groups() ('-3014', '0')

?.*?表示任意匹配除換行符（\n、\r）之外的任何單個(gè)或多個(gè)字符

模式描述

^	匹配字符串的開頭
$	匹配字符串的末尾。
.	匹配任意字符，除了換行符，當(dāng)re.DOTALL標(biāo)記被指定時(shí)，則可以匹配包括換行符的任意字符。
[...]	用來表示一組字符,單獨(dú)列出：[amk] 匹配 'a'，'m'或'k'
[^...]	不在[]中的字符：[^abc] 匹配除了a,b,c之外的字符。
re*	匹配0個(gè)或多個(gè)的表達(dá)式。
re+	匹配1個(gè)或多個(gè)的表達(dá)式。
re?	匹配0個(gè)或1個(gè)由前面的正則表達(dá)式定義的片段，非貪婪方式
re{ n}	匹配n個(gè)前面表達(dá)式。例如，"o{2}"不能匹配"Bob"中的"o"，但是能匹配"food"中的兩個(gè)o。
re{ n,}	精確匹配n個(gè)前面表達(dá)式。例如，"o{2,}"不能匹配"Bob"中的"o"，但能匹配"foooood"中的所有o。"o{1,}"等價(jià)于"o+"。"o{0,}"則等價(jià)于"o*"。
re{ n, m}	匹配 n 到 m 次由前面的正則表達(dá)式定義的片段，貪婪方式
a\| b	匹配a或b
(re)	匹配括號(hào)內(nèi)的表達(dá)式，也表示一個(gè)組
(?imx)	正則表達(dá)式包含三種可選標(biāo)志：i, m, 或 x 。只影響括號(hào)中的區(qū)域。
(?-imx)	正則表達(dá)式關(guān)閉 i, m, 或 x 可選標(biāo)志。只影響括號(hào)中的區(qū)域。
(?: re)	類似 (...), 但是不表示一個(gè)組
(?imx: re)	在括號(hào)中使用i, m, 或 x 可選標(biāo)志
(?-imx: re)	在括號(hào)中不使用i, m, 或 x 可選標(biāo)志
(?#...)	注釋.
(?= re)	前向肯定界定符。如果所含正則表達(dá)式，以 ... 表示，在當(dāng)前位置成功匹配時(shí)成功，否則失敗。但一旦所含表達(dá)式已經(jīng)嘗試，匹配引擎根本沒有提高；模式的剩余部分還要嘗試界定符的右邊。
(?! re)	前向否定界定符。與肯定界定符相反；當(dāng)所含表達(dá)式不能在字符串當(dāng)前位置匹配時(shí)成功。
(?> re)	匹配的獨(dú)立模式，省去回溯。
\w	匹配數(shù)字字母下劃線
\W	匹配非數(shù)字字母下劃線
\s	匹配任意空白字符，等價(jià)于 [\t\n\r\f]。
\S	匹配任意非空字符
\d	匹配任意數(shù)字，等價(jià)于 [0-9]。
\D	匹配任意非數(shù)字
\A	匹配字符串開始
\Z	匹配字符串結(jié)束，如果是存在換行，只匹配到換行前的結(jié)束字符串。
\z	匹配字符串結(jié)束
\G	匹配最后匹配完成的位置。
\b	匹配一個(gè)單詞邊界，也就是指單詞和空格間的位置。例如， 'er\b' 可以匹配"never" 中的 'er'，但不能匹配 "verb" 中的 'er'。
\B	匹配非單詞邊界。'er\B' 能匹配 "verb" 中的 'er'，但不能匹配 "never" 中的 'er'。
\n, \t, 等。	匹配一個(gè)換行符。匹配一個(gè)制表符, 等
\1...\9	匹配第n個(gè)分組的內(nèi)容。
\10	匹配第n個(gè)分組的內(nèi)容，如果它經(jīng)匹配。否則指的是八進(jìn)制字符碼的表達(dá)式。

舉例：

?\d{3}?：匹配3個(gè)數(shù)字

?\s+?：至少有一個(gè)空格

?\d{3,8}?：3-8個(gè)數(shù)字

>>> mySent = 'This book is the best book on Python or M.L. I have ever laid eyes upon.'>>> mySent.split(' ') ['This', 'book', 'is', 'the', 'best', 'book', 'on', 'Python', 'or', 'M.L.', 'I', 'have', 'ever', 'laid', 'eyes', 'upon.']>>> import re>>> listOfTokens = re.split(r'\W*', mySent)>>> listOfTokens ['This', 'book', 'is', 'the', 'best', 'book', 'on', 'Python', 'or', 'M', 'L', 'I', 'have', 'ever', 'laid', 'eyes', 'upon', '']>>> [tok for tok in listOfTokens if len(tok) > 0] ['This', 'book', 'is', 'the', 'best', 'book', 'on', 'Python', 'or', 'M', 'L', 'I', 'have', 'ever', 'laid', 'eyes', 'upon']>>> [tok.lower() for tok in listOfTokens if len(tok) > 0] ['this', 'book', 'is', 'the', 'best', 'book', 'on', 'python', 'or', 'm', 'l', 'i', 'have', 'ever', 'laid', 'eyes', 'upon']>>> [tok.lower() for tok in listOfTokens if len(tok) > 2] ['this', 'book', 'the', 'best', 'book', 'python', 'have', 'ever', 'laid', 'eyes', 'upon'] >>>

參考：python爬蟲（5）--正則表達(dá)式 - 小學(xué)森也要學(xué)編程 - 博客園??

實(shí)現(xiàn)刪除引號(hào)內(nèi)部的內(nèi)容，注意任意匹配使用【.*】

a = 'Sir Nina said: \"I am a Knight,\" but I am not sure' b = "Sir Nina said: \"I am a Knight,\" but I am not sure" print(re.sub(r'"(.*)"', '', a), re.sub(r'"(.*)"', '', b), sep='\n')Output: Sir Nina said: but I am not sure Sir Nina said: but I am not sure

Example from Eric Martin's learning materials of COMP9021

The following function checks that its argument is a string:

that from the beginning:?^

consists of possibly some spaces:??*

followed by an opening parenthesis:?\(

possibly followed by spaces:??*

possibly followed by either + or -:?[+-]?

followed by either 0, or a nonzero digit followed by any sequence of digits:?0|[1-9]\d*

possibly followed by spaces:??*

followed by a comma:?,

followed by characters matching the pattern described by 1-7

followed by a closing parenthesis:?\)

possibly followed by some spaces:??*

all the way to the end:?$

Pairs of parentheses surround both numbers to match to capture them. For point 5, a surrounding pair of parentheses is needed;??:?makes it non-capturing:

>>> def validate_and_extract_payoffs(provided_input):pattern = '^ *$ *([+-]?(?:0|[1-9]\d*)) *,'\' *([+-]?(?:0|[1-9]\d*)) *$ *$'match = re.search(pattern, provided_input)if match:return (match.groups())>>> validate_and_extract_payoffs('(+0, -7 )') ('+0', '-7')>>> validate_and_extract_payoffs(' (-3014,0) ') ('-3014', '0')

轉(zhuǎn)載于:https://www.cnblogs.com/alex-bn-lee/p/10325559.html

總結(jié)

以上是生活随笔為你收集整理的【362】python 正则表达式的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

【362】python 正则表达式

總結(jié)