生活随笔
收集整理的這篇文章主要介紹了
                                
爬虫项目4[爬取斗鱼直播数据]
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
 
                                
                            
                            
                            不用通過頁面源碼獲取,直接找數據的入口
 
斗魚直播是一個典型使用ajax的頁面,對于這樣的頁面簡單粗暴,直接在網頁控制臺的xhr里面找入口
 請求requests 解析json()
 在線json校驗工具:https://www.bejson.com/
 
來到第一頁發現沒有什么特別矚目的網頁,繼續往下找
 來到第二頁,發現了一個名為2的xhr文件,大膽猜想這玩意可能和頁碼有關,再看一頁試試
 來到第三頁,果然還有,這種頁面肯定藏有貓膩,不妨看看響應結果
 果不其然是json數據的格式,這下就好辦了,直接構造請求頭獲取json數據,再對數據進行清洗就ok,
 
 代碼如下:
 
import requests
from lxml 
import etreebase_url 
= "https://www.douyu.com/gapi/rkc/directory/2_1/{}"headers 
= {"authority": "www.douyu.com","method": "GET","scheme": "https","accept": "application/json, text/plain, */*","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","cookie": "dy_did=99d9bec8e3161267ca6f1b2700091501; acf_did=99d9bec8e3161267ca6f1b2700091501; smidV2=201910091127005e69063c81f439757b5c6853e98eb85600415c32cf59babd0; Hm_lvt_e99aee90ec1b2106afe7ec3b199020a7=1583281439; PHPSESSID=pifc2v49pv7eh3pfqh68vdmrp6; acf_auth=c805VIqQqC4NURXP%2BsXkVVLLs71Z3tGdFmlmwKvDfJddlPpBpHsZCb%2BAinbPuBGFqbJVR3zwn6rtV9neXmKxQjGRrSK212Jf4UlJNS5TrfPY6WwlpuI5I14; dy_auth=9679Wnn3NsJb2QR5Af1AKQpGbSYw6kgSwcujMSyG3AxQ3PSOPIINFiu%2FO7usyWfaQEGgY8xUgDHUVuTM0kSDrg4nj9Bg2Ib1AERZgYFzofeYDUjGrez85lo; wan_auth37wan=2d3ba7e8c7b7%2F2QURm%2FaQBYqJqHh6FwGQ26YRXP0y5n%2FjrR0gvtyc7%2FfBM%2FfhL%2F53HJ6mUBypKwmSw1Rk5ajw0Fx%2BpMyNOEG8bIiilruQGrYqED4kIA; acf_uid=329673281; acf_username=329673281; acf_nickname=%E7%94%A8%E6%88%B761411317; acf_own_room=0; acf_groupid=1; acf_phonestatus=1; acf_avatar=https%3A%2F%2Fapic.douyucdn.cn%2Fupload%2Favatar%2Fdefault%2F03_; acf_ct=0; acf_ltkid=69931249; acf_biz=1; acf_stk=391ed8ca5549845e; acf_ccn=b08c364a0d5c5aae33f1c5361ce1cfb6; Hm_lpvt_e99aee90ec1b2106afe7ec3b199020a7=1583281831","referer": "https://www.douyu.com/g_LOL","sec-fetch-dest": "empty","sec-fetch-mode": "cors","sec-fetch-site": "same-origin","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36","x-requested-with": "XMLHttpRequest"
}page 
= 6   
if __name__ 
== "__main__":for i 
in range(page
):url 
= base_url
.format(i
+1)response 
= requests
.get
(url
,headers
=headers
)  datas 
= response
.json
()["data"]["rl"]for data 
in datas
:     room 
= data
["rid"]  name 
= data
["rn"]   zhubo 
= data
["nn"]  print(room
,name
,zhubo
)
 
效果如下:
 
                            總結
                            
                                以上是生活随笔為你收集整理的爬虫项目4[爬取斗鱼直播数据]的全部內容,希望文章能夠幫你解決所遇到的問題。
                            
                            
                                如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。