python+BeautifulSoup+多进程爬取糗事百科图片
生活随笔
收集整理的這篇文章主要介紹了
python+BeautifulSoup+多进程爬取糗事百科图片
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
用到的庫(kù);
import requests import os from bs4 import BeautifulSoup import time from multiprocessing import Pool定義圖片存儲(chǔ)路徑;
path = r'E:\爬蟲(chóng)\0805\\'請(qǐng)求頭,模擬瀏覽器請(qǐng)求;
在瀏覽器中的位置,按f12打開(kāi)開(kāi)發(fā)者模式;
主函數(shù);
-------------------------------------------------------------------- 注:如果你對(duì)python感興趣,我這有個(gè)學(xué)習(xí)Python基地,里面有很多學(xué)習(xí)資料,感興趣的+Q群:895817687 -------------------------------------------------------------------- def get_images(url):data = 'https:'res = requests.get(url,headers=headers)soup = BeautifulSoup(res.text,'lxml')url_infos = soup.select('div.thumb > a > img')# print(url_infos)for url_info in url_infos:try:urls = data+url_info.get('src')if os.path.exists(path+urls.split('/')[-1]):print('圖片已下載')else:image = requests.get(urls,headers=headers)with open(path+urls.split('/')[-1],'wb') as fp:fp.write(image.content)print('正在下載:'+urls)time.sleep(0.5)except Exception as e:print(e)開(kāi)始爬蟲(chóng)程序;
if __name__ == '__main__':# 路由列表urls = ['https://www.qiushibaike.com/imgrank/page/{}/'.format(i) for i in range(1,14)]# 開(kāi)啟多進(jìn)程爬取pool = Pool()pool.map(get_images,urls)print('抓取完畢')爬取中;
打開(kāi)文件夾查看爬取結(jié)果;
done
完整代碼;
import requests import os from bs4 import BeautifulSoup import time from multiprocessing import Pool """ ************常用爬蟲(chóng)庫(kù)***********requestsBeautifulSouppyquery lxml ************爬蟲(chóng)框架***********scrapy三大解析方式:re,css,xpath """ headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' } path = r'E:\爬蟲(chóng)\0805\\' def get_images(url):data = 'https:'res = requests.get(url,headers=headers)soup = BeautifulSoup(res.text,'lxml')url_infos = soup.select('div.thumb > a > img')# print(url_infos)for url_info in url_infos:try:urls = data+url_info.get('src')if os.path.exists(path+urls.split('/')[-1]):print('圖片已下載')else:image = requests.get(urls,headers=headers)with open(path+urls.split('/')[-1],'wb') as fp:fp.write(image.content)print('正在下載:'+urls)time.sleep(0.5)except Exception as e:print(e)if __name__ == '__main__':# 路由列表urls = ['https://www.qiushibaike.com/imgrank/page/{}/'.format(i) for i in range(1,14)]# 開(kāi)啟多進(jìn)程爬取pool = Pool()pool.map(get_images,urls)print('抓取完畢')總結(jié)
以上是生活随笔為你收集整理的python+BeautifulSoup+多进程爬取糗事百科图片的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 二进制包安装MySQL数据库
- 下一篇: django项目简单调取百度翻译接口