當前位置：首頁 > 前端技术 > HTML >内容正文

HTML

爬虫-基于bs4库的HTML内容查找方法

發布時間：2025/7/14 HTML 12 豆豆

生活随笔收集整理的這篇文章主要介紹了爬虫-基于bs4库的HTML内容查找方法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

bs4有一個find_all(name,attrs,recursive,string,**kwargs)方法，返回一個列表類型，存儲查找的結果

name 對標簽名稱的檢索字符串

attrs 對標簽屬性值的檢索字符串，可標注屬性檢索，可查找某標簽中是否含有特定的字符串

?recursive 是否對子孫全部檢索，默認True

string <>...</>中字符串區域的檢索字符串

舉例說明：

name

soup.find_all('a')#返回a標簽的內容 soup.find_all(['a','b'])#返回a和b標簽的內容for tag in soup.find_all(True):#打印文檔中的所有標簽名字print(tag.name) ''' 返回 html head title body p b p a a ''' #使用正則化后： import re#如果我們只想得到以b開頭的標簽，n那么我們需要正則表達式，re是相應的庫 for tag in soup.find_all(re.compile('b')):print(tag.name) #返回 body b

?attrs:

soup.find_all('p','course')#查找p標簽中包含'course'的信息 soup.find_all(id='link1') '''返回 [<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>] ''' soup.find_all('link')#返回[]import re soup.find_all(id=re.compile('link'))#利用正則表達式查找包含link的標簽內容 ''' [<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>,<a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>] '''

recursive:

soup.find_all('a',recursive=False) #返回[]表明兒子節點上沒有a標簽

string:

soup.find_all(string='Basic Python') #['Basic Python']import re soup.find_all(string=re.compile('python'))#所有在字符串中出現Python的字符串檢索 ''' ['This is a python demo page','The demo python introduces several python courses.'] '''

另外，我們可以使用

<tag>(..)等價于<tag>.find_all(..)

soup(..)等價于soup.find_all(..)

find的擴展方法

方法	說明
<>.find()	搜索切只返回一個結果，字符串類型，同find_all()參數
<>.find_parents()	在先輩節點中搜索，返回列表類型，同find_all()參數
<>.find_parent()	在先輩節點中返回一個結果，同上
<>.find_next_siblings()	在后續平行節點中搜索，同上
<>.find_next_sibling()	在后續節點中返回一個結果，同上
<>.find_previous_siblings()	在前序平行節點中搜索，同上
<>.find_previous_sibling()	在前序平行節點中返回一個結果，同上

轉載于:https://www.cnblogs.com/rayshaw/p/8577120.html

總結

以上是生活随笔為你收集整理的爬虫-基于bs4库的HTML内容查找方法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：工业机器人专项检测技术——环境检测
下一篇： C语言中static关键字的作用

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

HTML

爬虫-基于bs4库的HTML内容查找方法

總結