python3 正则表达式 嵌套表格_在Python中使用正则表达式匹配嵌套结构
unutbu..
14
編輯: falsetru的嵌套解析器,我稍微修改為接受任意正則表達式模式來指定分隔符和項目分隔符,比我原來的re.Scanner解決方案更快更簡單:
import re
def parse_nested(text, left=r'[(]', right=r'[)]', sep=r','):
""" /sf/ask/17360801/ (falsetru) """
pat = r'({}|{}|{})'.format(left, right, sep)
tokens = re.split(pat, text)
stack = [[]]
for x in tokens:
if not x or re.match(sep, x):
continue
if re.match(left, x):
# Nest a new list inside the current list
current = []
stack[-1].append(current)
stack.append(current)
elif re.match(right, x):
stack.pop()
if not stack:
raise ValueError('error: opening bracket is missing')
else:
stack[-1].append(x)
if len(stack) > 1:
print(stack)
raise ValueError('error: closing bracket is missing')
return stack.pop()
text = "a {{c1::group {{c2::containing::HINT}} a few}} {{c3::words}} or three"
print(parse_nested(text, r'\s*{{', r'}}\s*'))
產量
['a', ['c1::group', ['c2::containing::HINT'], 'a few'], ['c3::words'], 'or three']
嵌套結構不能與Python正則表達式匹配的單獨的,但它是非常容易建立一個基本解析器使用(其可以處理嵌套結構)re.Scanner:
import re
class Node(list):
def __init__(self, parent=None):
self.parent = parent
class NestedParser(object):
def __init__(self, left='\(', right='\)'):
self.scanner = re.Scanner([
(left, self.left),
(right, self.right),
(r"\s+", None),
(".+?(?=(%s|%s|$))" % (right, left), self.other),
])
self.result = Node()
self.current = self.result
def parse(self, content):
self.scanner.scan(content)
return self.result
def left(self, scanner, token):
new = Node(self.current)
self.current.append(new)
self.current = new
def right(self, scanner, token):
self.current = self.current.parent
def other(self, scanner, token):
self.current.append(token.strip())
它可以像這樣使用:
p = NestedParser()
print(p.parse("((a+b)*(c-d))"))
# [[['a+b'], '*', ['c-d']]]
p = NestedParser()
print(p.parse("( (a ( ( c ) b ) ) ( d ) e )"))
# [[['a', [['c'], 'b']], ['d'], 'e']]
默認情況下,NestedParser匹配嵌套括號.您可以傳遞其他正則表達式以匹配其他嵌套模式,例如括號[].例如,
p = NestedParser('\[', '\]')
result = (p.parse("Lorem ipsum dolor sit amet [@a xxx yyy [@b xxx yyy [@c xxx yyy]]] lorem ipsum sit amet"))
# ['Lorem ipsum dolor sit amet', ['@a xxx yyy', ['@b xxx yyy', ['@c xxx yyy']]],
# 'lorem ipsum sit amet']
p = NestedParser('', '')
print(p.parse("BARBAZ"))
# [['BAR', ['BAZ']]]
當然,pyparsing可以比上面的代碼做得更多.但是對于這個單一目的,NestedParser對于小字符串,上述速度大約快5倍:
In [27]: import pyparsing as pp
In [28]: data = "( (a ( ( c ) b ) ) ( d ) e )"
In [32]: %timeit pp.nestedExpr().parseString(data).asList()
1000 loops, best of 3: 1.09 ms per loop
In [33]: %timeit NestedParser().parse(data)
1000 loops, best of 3: 234 us per loop
大字符串的速度提高約28倍:
In [44]: %timeit pp.nestedExpr().parseString('({})'.format(data*10000)).asList()
1 loops, best of 3: 8.27 s per loop
In [45]: %timeit NestedParser().parse('({})'.format(data*10000))
1 loops, best of 3: 297 ms per loop
總結
以上是生活随笔為你收集整理的python3 正则表达式 嵌套表格_在Python中使用正则表达式匹配嵌套结构的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 亏吗?急
- 下一篇: “生涯有分限”下一句是什么