當前位置：首頁 > 编程语言 > python >内容正文

python

总说手机没有“好壁纸”，Python一次性抓取500张“美女”图片，够不够用！

發布時間：2024/9/15 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了总说手机没有“好壁纸”，Python一次性抓取500张“美女”图片，够不够用！小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

作者 | 舊時晚風拂曉城? ? ? ?編輯?| JackTian

來源 | 杰哥的IT之旅（ID：Jake_Internet）

原文鏈接：https://blog.csdn.net/fyfugoyfa/article/details/107734468

1. 爬取一頁的圖片

正則匹配提取圖片數據

網頁源代碼部分截圖如下：

重新設置 GBK 編碼解決了亂碼問題

代碼實現：

import?requests import?re#?設置保存路徑 path?=?r'D:\test\picture_1\?' #?目標url url?=?"http://pic.netbian.com/4kmeinv/index.html" #?偽裝請求頭??防止被反爬 headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.1?(KHTML,?like?Gecko)?Chrome/21.0.1180.89?Safari/537.1","Referer":?"http://pic.netbian.com/4kmeinv/index.html" }#?發送請求??獲取響應 response?=?requests.get(url,?headers=headers) #?打印網頁源代碼來看??亂碼???重新設置編碼解決編碼問題 #?內容正常顯示??便于之后提取數據 response.encoding?=?'GBK'#?正則匹配提取想要的數據??得到圖片鏈接和名稱 img_info?=?re.findall('img?src="(.*?)"?alt="(.*?)"?/',?response.text)for?src,?name?in?img_info:img_url?=?'http://pic.netbian.com'?+?src???#?加上?'http://pic.netbian.com'才是真正的圖片urlimg_content?=?requests.get(img_url,?headers=headers).contentimg_name?=?name?+?'.jpg'with?open(path?+?img_name,?'wb')?as?f:?????#?圖片保存到本地print(f"正在為您下載圖片：{img_name}")f.write(img_content)

Xpath定位提取圖片數據

代碼實現：

import?requests from?lxml?import?etree#?設置保存路徑 path?=?r'D:\test\picture_1\?' #?目標url url?=?"http://pic.netbian.com/4kmeinv/index.html" #?偽裝請求頭??防止被反爬 headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.1?(KHTML,?like?Gecko)?Chrome/21.0.1180.89?Safari/537.1","Referer":?"http://pic.netbian.com/4kmeinv/index.html" }#?發送請求??獲取響應 response?=?requests.get(url,?headers=headers) #?打印網頁源代碼來看??亂碼???重新設置編碼解決編碼問題 #?內容正常顯示??便于之后提取數據 response.encoding?=?'GBK' html?=?etree.HTML(response.text) #?xpath定位提取想要的數據??得到圖片鏈接和名稱 img_src?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@src') #?列表推導式???得到真正的圖片url img_src?=?['http://pic.netbian.com'?+?x?for?x?in?img_src] img_alt?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@alt')for?src,?name?in?zip(img_src,?img_alt):img_content?=?requests.get(src,?headers=headers).contentimg_name?=?name?+?'.jpg'with?open(path?+?img_name,?'wb')?as?f:???#?圖片保存到本地print(f"正在為您下載圖片：{img_name}")f.write(img_content)

2.翻頁爬取，實現批量下載

單線程版

import?requests from?lxml?import?etree import?datetime import?time#?設置保存路徑 path?=?r'D:\test\picture_1\?' headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.1?(KHTML,?like?Gecko)?Chrome/21.0.1180.89?Safari/537.1","Referer":?"http://pic.netbian.com/4kmeinv/index.html" } start?=?datetime.datetime.now()def?get_img(urls):for?url?in?urls:#?發送請求??獲取響應response?=?requests.get(url,?headers=headers)#?打印網頁源代碼來看??亂碼???重新設置編碼解決編碼問題#?內容正常顯示??便于之后提取數據response.encoding?=?'GBK'html?=?etree.HTML(response.text)#?xpath定位提取想要的數據??得到圖片鏈接和名稱img_src?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@src')#?列表推導式???得到真正的圖片urlimg_src?=?['http://pic.netbian.com'?+?x?for?x?in?img_src]img_alt?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@alt')for?src,?name?in?zip(img_src,?img_alt):img_content?=?requests.get(src,?headers=headers).contentimg_name?=?name?+?'.jpg'with?open(path?+?img_name,?'wb')?as?f:??#?圖片保存到本地# print(f"正在為您下載圖片：{img_name}")f.write(img_content)time.sleep(1)def?main():#?要請求的url列表url_list?=?['http://pic.netbian.com/4kmeinv/index.html']?+?[f'http://pic.netbian.com/4kmeinv/index_{i}.html'?for?i?in?range(2,?11)]get_img(url_list)delta?=?(datetime.datetime.now()?-?start).total_seconds()print(f"抓取10頁圖片用時：{delta}s")if?__name__?==?'__main__':main()

程序運行成功，抓取了10頁的圖片，共210張，用時63.682837s。

多線程版

import?requests from?lxml?import?etree import?datetime import?time import?random from?concurrent.futures?import?ThreadPoolExecutor#?設置保存路徑 path?=?r'D:\test\picture_1\?' user_agent?=?["Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.1?(KHTML,?like?Gecko)?Chrome/22.0.1207.1?Safari/537.1","Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.6?(KHTML,?like?Gecko)?Chrome/20.0.1092.0?Safari/536.6","Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.6?(KHTML,?like?Gecko)?Chrome/20.0.1090.0?Safari/536.6","Mozilla/5.0?(Windows?NT?6.2;?WOW64)?AppleWebKit/537.1?(KHTML,?like?Gecko)?Chrome/19.77.34.5?Safari/537.1","Mozilla/5.0?(Windows?NT?6.0)?AppleWebKit/536.5?(KHTML,?like?Gecko)?Chrome/19.0.1084.36?Safari/536.5","Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1063.0?Safari/536.3","Mozilla/5.0?(Windows?NT?5.1)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1063.0?Safari/536.3","Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1062.0?Safari/536.3","Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1062.0?Safari/536.3","Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3","Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3","Mozilla/5.0?(Windows?NT?6.1)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3","Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?(KHTML,?like?Gecko)?Chrome/19.0.1061.0?Safari/536.3","Mozilla/5.0?(Windows?NT?6.2;?WOW64)?AppleWebKit/535.24?(KHTML,?like?Gecko)?Chrome/19.0.1055.1?Safari/535.24"] start?=?datetime.datetime.now()def?get_img(url):headers?=?{"User-Agent":?random.choice(user_agent),"Referer":?"http://pic.netbian.com/4kmeinv/index.html"}#?發送請求??獲取響應response?=?requests.get(url,?headers=headers)#?打印網頁源代碼來看??亂碼???重新設置編碼解決編碼問題#?內容正常顯示??便于之后提取數據response.encoding?=?'GBK'html?=?etree.HTML(response.text)#?xpath定位提取想要的數據??得到圖片鏈接和名稱img_src?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@src')#?列表推導式???得到真正的圖片urlimg_src?=?['http://pic.netbian.com'?+?x?for?x?in?img_src]img_alt?=?html.xpath('//ul[@class="clearfix"]/li/a/img/@alt')for?src,?name?in?zip(img_src,?img_alt):img_content?=?requests.get(src,?headers=headers).contentimg_name?=?name?+?'.jpg'with?open(path?+?img_name,?'wb')?as?f:??#?圖片保存到本地# print(f"正在為您下載圖片：{img_name}")f.write(img_content)time.sleep(random.randint(1,?2))def?main():#?要請求的url列表url_list?=?['http://pic.netbian.com/4kmeinv/index.html']?+?[f'http://pic.netbian.com/4kmeinv/index_{i}.html'?for?i?in?range(2,?51)]with?ThreadPoolExecutor(max_workers=6)?as?executor:executor.map(get_img,?url_list)delta?=?(datetime.datetime.now()?-?start).total_seconds()print(f"爬取50頁圖片用時：{delta}s")if?__name__?==?'__main__':main()

程序運行成功，抓取了50頁圖片，共1047張，用時56.71979s。開多線程大大提高的爬取數據的效率。

最終成果如下：

3. 其他說明

由于微信平臺算法改版，公號內容將不再以時間排序展示，如果大家想第一時間看到我們的推送，強烈建議星標我們和給我們多點點【在看】。星標具體步驟為：（1）點擊頁面最上方“小詹學Python”，進入公眾號主頁。（2）點擊右上角的小點點，在彈出頁面點擊“設為星標”，就可以啦。感謝支持，比心。

總結

以上是生活随笔為你收集整理的总说手机没有“好壁纸”，Python一次性抓取500张“美女”图片，够不够用！的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

总说手机没有“好壁纸”，Python一次性抓取500张“美女”图片，够不够用！

總結