當前位置：首頁 > 编程语言 > python >内容正文

python

Python常用网页字符串处理技巧

發(fā)布時間：2025/5/22 python 9 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python常用网页字符串处理技巧小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

首先一些Python字符串處理的簡易常用的用法。其他的以后用到再補充。

1.去掉重復空格

s = "hello hello hello" s = ' '.join(s.split())

2.去掉所有回車（或其他字符或字符串）

s = "hello\nhello\nhello hello\n" print(s) s = s.replace("\n","") print(s)

3.查找字符串首次出現(xiàn)的位置（沒有返回-1）

s = "hello\nhello\nhello hello\n" print(s.find('\n')) print(s.find('la'))

4.查找字符串從后往前找首次出現(xiàn)的位置（沒有返回-1）

s = "hello\nhello\nhello hello\n" print(s.rfind('\n')) print(s.rfind('la'))

5.將字符串轉(zhuǎn)化成列表list

s = "hello\nhello\nhello hello\n" print(list(s))

6.查找所有匹配的子串

import res = "hello\nhello\nhello hello\n" print(re.findall('hello',s)) # hello也可以換成正則表達式

然后是網(wǎng)頁字符串處理的高端用法：（綜合運用requests模塊，beautifulsoup模塊，re模塊等）

1.requests獲取一個鏈接的內(nèi)容并原封不動寫入文件

import requestsr = requests.get('https://baike.baidu.com') with open('test.html', 'wb') as fd:for chunk in r.iter_content(100):fd.write(chunk)

2.讀取一個文件的所有內(nèi)容存到一個字符串里

# encoding : utf-8 with open('test.html','r',encoding='utf-8') as f:content = f.readlines() content = ''.join(content) # content = content.replace('\n','') # 如果想去掉回車可以加上這行 print(content)

3.把網(wǎng)頁字符串用BeautifulSoup存起來處理

from bs4 import BeautifulSoupsoup = BeautifulSoup(content,'html.parser') print(soup.prettify())

4.存到BeautifulSoup里之后這個字符串就可以任你擺布了，比如：提取出所有<a>標簽

soup = BeautifulSoup(content,'html.parser') print(soup.find_all('a'))

或者提取出所有<a>標簽和<b>標簽

soup = BeautifulSoup(content,'html.parser') print(soup.find_all(['a','b']))

這些屬于beautifulsoup的內(nèi)容了，可以看官方文檔：https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

也可以看我的另一篇博客：http://www.cnblogs.com/itlqs/p/5902678.html

5.多個關鍵字切分字符串

import re re.split('; |, ',str)>>> a='Beautiful, is; better*than\nugly' >>> import re >>> re.split('; |, |\*|\n',a) ['Beautiful', 'is', 'better', 'than', 'ugly']

轉(zhuǎn)載于:https://www.cnblogs.com/itlqs/p/5942374.html

總結

以上是生活随笔為你收集整理的Python常用网页字符串处理技巧的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python操作excel之模块 xl
下一篇：忘记mysq rootl密码

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

Python常用网页字符串处理技巧

總結