画师通网站团片爬取——二次元的福利
生活随笔
收集整理的這篇文章主要介紹了
画师通网站团片爬取——二次元的福利
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
先聲明一下畫師通的的網址,這里面存在大量的二次元的圖片??傆幸豢钸m合你,但小孩子才做選擇題,我們全要!!!
進入畫師通
爬取效果
爬取代碼
import requestsfrom lxml import etreeclass Dmimg:def __init__(self):self.headers = {"User - Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36","Cookie": "UM_distinctid = 1712e065ee94a5 - 0fe79487749dc4 - f313f6d - 144000 - 1712e065eea910;hstud = u2ltte469895f389;auth_tk = MGRiNjZhODQxODE3NGM4ZTllMmFmYzQyODhjZGNhZTZvbzd2bl8yODcwNQ ==;Hm_lvt_a3e2ff554f3229fd90bcfe77f75b9806 = 1585615106, 1585615135;Hm_lpvt_a3e2ff554f3229fd90bcfe77f75b9806 = 1585651165", "If - Modified - Since": "Sun, 29 Mar 2020 05:38: 04GMT","If - None - Match": "AIF7wq3NzjqeN4RpTnJILDgjP8SQ",}self.conut=0def get_url_list(self):url_list =[]url_list.append("https://www.huashi6.com/share")for i in range(1000, 10000):url_list.append("https://www.huashi6.com/draw/{}".format(i))return url_listdef get_img_url(self,url_list):print(url_list)content = requests.get(url_list,headers =self.headers)img_url = etree.HTML(content.content)url = img_url.xpath('//*[@id="imgTooles"]/div/img/@src')try:for url_img in url :img = requests.get(url_img,headers =self.headers)name = "jpg"if "png" in url_img:name = "png"with open('img/'+str(self.conut)+'.'+name,"wb") as f:print("寫入成功")print(img.content)f.write(img.content)self.conut = self.conut + 1except:print("寫入失敗")def run(self):pass# 1.獲取爬取網站的列表url_list = self.get_url_list()# 2.訪問網站內容并提取圖片鏈接for url in url_list:img_url = self.get_img_url(url)if __name__ == "__main__":Dm = Dmimg()Dm.run()總結
以上是生活随笔為你收集整理的画师通网站团片爬取——二次元的福利的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: POI导出详解
- 下一篇: ubuntu16.04 安装keepas