python3爬虫学习笔记
生活随笔
收集整理的這篇文章主要介紹了
python3爬虫学习笔记
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
文章目錄
- python3的文本處理
- jieba庫的使用
- 統計hamlet.txt文本中高頻詞的個數
- 統計三國演義任務高頻次數
- 爬蟲
- 爬取百度首頁
- 爬取京東某手機頁面
- BeautifulSoup
- 使用request進行爬取,在使用 BeautifulSoup進行處理!擁有一個更好的排版
- BeautifulSoup爬取百度首頁
原文記錄內容太多現進行摘錄和分類
python3的文本處理
jieba庫的使用
pip3 install jieba
統計hamlet.txt文本中高頻詞的個數
講解視頻
kou@ubuntu:~/python$ cat ClaHamlet.py #!/usr/bin/env python # coding=utf-8#e10.1CalHamlet.py def getText():txt = open("hamlet.txt", "r").read()txt = txt.lower()for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':txt = txt.replace(ch, " ") #將文本中特殊字符替換為空格return txt hamletTxt = getText() words = hamletTxt.split() counts = {} for word in words: counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))統計三國演義任務高頻次數
#!/usr/bin/env python # coding=utf-8#e10.1CalHamlet.py def getText():txt = open("hamlet.txt", "r").read()txt = txt.lower()for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':txt = txt.replace(ch, " ") #將文本中特殊字符替換為空格return txt hamletTxt = getText() words = hamletTxt.split() counts = {} for word in words: counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(10):word, count = items[i]print ("{0:<10}{1:>5}".format(word, count))爬蟲
學習資源是中國大學mooc的爬蟲課程。《嵩天老師》
下面寫幾個簡單的代碼!熟悉這幾個代碼的書寫以后基本可以完成需求!
爬取百度首頁
import requestsr = requests.get("https://www.baidu.com") fo = open("baidu.txt", "w+") r.encoding = 'utf-8' str = r.text line = fo.write( str )爬取京東某手機頁面
import requests url = "https://item.jd.com/2967929.html" try:r = requests.get(url)r.raise_for_status()//如果不是200就會報錯r.encoding = r.apparent_encoding//轉utf-8格式print(r.text[:1000])//只有前1000行 except:print("False")fo.close()BeautifulSoup
使用request進行爬取,在使用 BeautifulSoup進行處理!擁有一個更好的排版
fo = open("jingdong.md","w")url = "https://item.jd.com/2967929.html" try:r = requests.get(url)r.encoding = r.apparent_encodingdemo = r.textsoup = BeautifulSoup(demo,"html.parser")fo.write(soup.prettify())fo.writelines(soup.prettify()) except:print("False")fo.close()BeautifulSoup爬取百度首頁
fo = open("baidu.md","w")try:r = requests.get("https://www.baidu.com")r.encoding = r.apparent_encodingdemo = r.textsoup = BeautifulSoup(demo,"html.parser")fo.write(soup.prettify())fo.writelines(soup.prettify()) except:print("False") fo.close()附贈
爬蟲和python例子開源鏈接
總結
以上是生活随笔為你收集整理的python3爬虫学习笔记的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 多囊卵巢综合症是看妇科吗
- 下一篇: 新资料片是玩什么?