python中span函数,如何用python中BeautifulSoup提取无类名的span内文本
您可以使用遞歸函數迭代id='dictionary-neodict-es'的最外層div,以說明存在多個{}類為indent FyTYr的嵌套{}:from bs4 import BeautifulSoup as soup
import requests, bs4
def has_class(d, c):
return any(c in i.attrs.get('class', []) or has_class(getattr(i, 'contents', []), c) for i in d if i != '\n' and not isinstance(i, bs4.NavigableString))
def get_sentences(d):
if 'indent FyTYr' in d.attrs.get('class', []) and not has_class(d.contents, 'indent FyTYr'):
yield [d.div.span.text, d.div.em.text]
else:
for i in filter(lambda x:x != '\n' and not isinstance(x, bs4.NavigableString), getattr(d, 'contents', [])):
yield from get_sentences(i)
result = list(get_sentences(soup(requests.get('https://www.spanishdict.com/translate/rojo').text, 'html.parser').find('div', {'id':'dictionary-neodict-es'})))
現在,您可以訪問所有句子:
^{pr2}$
要訪問所需的字符串:
^{3}$
輸出:'The sky turned red at sundown.'
總結
以上是生活随笔為你收集整理的python中span函数,如何用python中BeautifulSoup提取无类名的span内文本的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 电脑入门完全自学手册_电气自动化自学宝典
- 下一篇: python列表添加元素到中间_pyth