當前位置：首頁 > 编程语言 > python >内容正文

python

python3 爬虫日记(三) 爬取堆糖动态加载网页

發(fā)布時間：2023/12/15 python 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 python3 爬虫日记(三) 爬取堆糖动态加载网页小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1.分析：進入堆糖網后我們在分類找到插畫繪畫進入這個分類后發(fā)現(xiàn)好多圖片，下拉后發(fā)現(xiàn)會有不斷的圖片刷新出來，這就是堆糖采用了動態(tài)加載網頁。

2.用開發(fā)者工具(F12)分析：按一下F12，找到network分支，再按一下F5，將刷新后的網頁一直往下拉，打開XHR，發(fā)現(xiàn)Name下有兩個或多個？include開頭字段，然后觀察Header和Preview發(fā)現(xiàn)它的圖片信息是json格式的數(shù)據(jù)。

3.準備開工。

# -*- coding:utf-8 -*- import pymongo from requests.exceptions import RequestException import requests import json from urllib.parse import urlencodedef get_index_page(start_page,id_page):headers = {'User-Agent': "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",'Referer':'http://www.duitang.com/category/?cat=painting','Accept': 'text/plain, */*; q=0.01','Host': 'www.duitang.com','Accept - Encoding': 'gzip,deflate, sdch','Accept - Language': 'zh - CN, zh;q=0.8','Connection': 'keep - alive',}data = {'include_fields': 'top_comments,''is_root, source_link, item, buyable, root_id, status, like_count, sender, album','filter_id': '插畫繪畫','start':start_page,'_':id_page,}url = 'http://www.duitang.com/napi/blog/list/by_filter_id/?' + urlencode(data) # 拼接URLtry:response = requests.get(url, headers=headers)if response.status_code == 200:return response.textreturn Noneexcept RequestException:print('請求索引頁出錯!')return Nonedef parse_page_index():i = 0n = 1498560199148while i < 23976:i = i + 24n = n + 1html = get_index_page(i, n)for n in range(24):data = json.loads(html.strip()) # 將json字典轉換為python字典img_url = data['data']['object_list'][n]['photo']['path'] # 獲取字典中的圖片鏈接title = data['data']['object_list'][n]['msg'] # 獲取字典中的標題post_sub.insert_one({'img_id': title, 'img_url': img_url}) # 插入到數(shù)據(jù)庫中print(title,img_url)if __name__ == '__main__':connection = pymongo.MongoClient()post_info = connection.duitang_paintingpost_sub = post_info.duitangparse_page_index()

總結

以上是生活随笔為你收集整理的python3 爬虫日记(三) 爬取堆糖动态加载网页的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：删除“打开方式”里的其他程序
下一篇：三、Unity2D游戏制作——角色制作

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

python3 爬虫日记(三) 爬取堆糖动态加载网页

總結