Python 正则re模块之compile()和findall()详解
生活随笔
收集整理的這篇文章主要介紹了
Python 正则re模块之compile()和findall()详解
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
首先我們看下官方文檔里關于的compile的說明:
re.compile(pattern, flags=0) Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods, described below.The expression’s behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator). </pre><pre name="code" class="python">The sequence: prog = re.compile(pattern) result = prog.match(string) <strong><span style="font-size:24px;">is equivalent to</span></strong> result = re.match(pattern, string) but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.Note:The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.下面是flag dotall的說明:
re.DOTALL Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.》》》》》》》》》》》》》》》》》》》》
下面是關于findall的說明:
re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.》》》》》》》》》》》》》》》》》》》》
下面舉個栗子進行講解: >>> import re >>> s = "adfad asdfasdf asdfas asdfawef asd adsfas ">>> reObj1 = re.compile('((\w+)\s+\w+)') >>> reObj1.findall(s) [('adfad asdfasdf', 'adfad'), ('asdfas asdfawef', 'asdfas'), ('asd adsfas', 'asd')]>>> reObj2 = re.compile('(\w+)\s+\w+') >>> reObj2.findall(s) ['adfad', 'asdfas', 'asd']>>> reObj3 = re.compile('\w+\s+\w+') >>> reObj3.findall(s) ['adfad asdfasdf', 'asdfas asdfawef', 'asd adsfas']代碼參考下圖進行理解:
findall函數返回的總是正則表達式在字符串中所有匹配結果的列表list,此處主要討論列表中“結果”的展現方式,即findall中返回列表中每個元素包含的信息。
1.當給出的正則表達式中帶有多個括號時,列表的元素為多個字符串組成的tuple,tuple中字符串個數與括號對數相同,字符串內容與每個括號內的正則表達式相對應,并且排放順序是按括號出現的順序。
2.當給出的正則表達式中帶有一個括號時,列表的元素為字符串,此字符串的內容與括號中的正則表達式相對應(不是整個正則表達式的匹配內容)。
3.當給出的正則表達式中不帶括號時,列表的元素為字符串,此字符串為整個正則表達式匹配的內容。
《《《《《《《《《《《《《《《《《
對于.re.compile.findall(data)之后的數據,我們可以通過list的offset索引或者str.join()函數來使之變成str字符串,從而進行方便的處理,下面是python3.5中str.join()的文檔: str.join(iterable) Return a string which is the concatenation of the strings in the iterable iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects.The separator between elements is the string providing this method.經過上面的介紹,相信對crawler里的正則有很大的幫助
總結
以上是生活随笔為你收集整理的Python 正则re模块之compile()和findall()详解的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 编程中常见英语
- 下一篇: 【Python】基础速通