當前位置：首頁 > 编程语言 > python >内容正文

python

python3爬虫学习笔记

發布時間：2023/11/30 python 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 python3爬虫学习笔记小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

python3的文本處理
jieba庫的使用
- 統計hamlet.txt文本中高頻詞的個數
- 統計三國演義任務高頻次數
爬蟲
- 爬取百度首頁
- 爬取京東某手機頁面
BeautifulSoup
- 使用request進行爬取，在使用 BeautifulSoup進行處理！擁有一個更好的排版
- BeautifulSoup爬取百度首頁

原文記錄內容太多現進行摘錄和分類

python3的文本處理

jieba庫的使用

pip3 install jieba

統計hamlet.txt文本中高頻詞的個數

講解視頻

kou@ubuntu:~/python$ cat ClaHamlet.py #!/usr/bin/env python # coding=utf-8#e10.1CalHamlet.py def getText():txt = open("hamlet.txt", "r").read()txt = txt.lower()for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':txt = txt.replace(ch, " ") #將文本中特殊字符替換為空格return txt hamletTxt = getText() words = hamletTxt.split() counts = {} for word in words: counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))

統計三國演義任務高頻次數

#!/usr/bin/env python # coding=utf-8#e10.1CalHamlet.py def getText():txt = open("hamlet.txt", "r").read()txt = txt.lower()for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':txt = txt.replace(ch, " ") #將文本中特殊字符替換為空格return txt hamletTxt = getText() words = hamletTxt.split() counts = {} for word in words: counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))

爬蟲

學習資源是中國大學mooc的爬蟲課程。《嵩天老師》
下面寫幾個簡單的代碼！熟悉這幾個代碼的書寫以后基本可以完成需求！

爬取百度首頁

import requestsr = requests.get("https://www.baidu.com") fo = open("baidu.txt", "w+") r.encoding = 'utf-8' str = r.text line = fo.write( str )

爬取京東某手機頁面

import requests url = "https://item.jd.com/2967929.html" try:r = requests.get(url)r.raise_for_status()//如果不是200就會報錯r.encoding = r.apparent_encoding//轉utf-8格式print(r.text[:1000])//只有前1000行 except:print("False")fo.close()

BeautifulSoup

使用request進行爬取，在使用 BeautifulSoup進行處理！擁有一個更好的排版

fo = open("jingdong.md","w")url = "https://item.jd.com/2967929.html" try:r = requests.get(url)r.encoding = r.apparent_encodingdemo = r.textsoup = BeautifulSoup(demo,"html.parser")fo.write(soup.prettify())fo.writelines(soup.prettify()) except:print("False")fo.close()

BeautifulSoup爬取百度首頁

fo = open("baidu.md","w")try:r = requests.get("https://www.baidu.com")r.encoding = r.apparent_encodingdemo = r.textsoup = BeautifulSoup(demo,"html.parser")fo.write(soup.prettify())fo.writelines(soup.prettify()) except:print("False") fo.close()

附贈
爬蟲和python例子開源鏈接

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的python3爬虫学习笔记的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。