scrapy爬取斗鱼图片并且重命名后保存
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                scrapy爬取斗鱼图片并且重命名后保存
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
                                斗魚的api為:
http://capi.douyucdn.cn/api/v1/live?limit=20&offset=0建立一個scrapy爬蟲
douyu.py
import jsonimport scrapy from Douyu.items import DouyuItemclass DouyuSpider(scrapy.Spider):name = 'douyu'allowed_domains = ['douyucdn.cn']baseURL = 'http://capi.douyucdn.cn/api/v1/live?limit=20&offset='offset = 0start_urls = [baseURL + str(offset)]def parse(self, response):data_list = json.loads(response.body)['data']if len(data_list) == 0:returnfor data in data_list:item = DouyuItem()item['imagelink'] = data['vertical_src']item['nickname'] = data['nickname']yield itemself.offset += 20url = self.baseURL + str(self.offset)yield scrapy.Request(url, callback=self.parse)piplines
import scrapy import os from scrapy.pipelines.images import ImagesPipeline from Douyu.settings import IMAGES_STORE as images_storeclass DouyuPipeline(ImagesPipeline):def get_media_requests(self, item, info):image_link = item['imagelink']yield scrapy.Request(image_link)def item_completed(self, results, item, info):image_path = [x['path'] for ok, x in results if ok]print('圖片路徑是:', images_store + image_path[0])os.rename(images_store + '/' + image_path[0], images_store + '/' + item["nickname"] + '.jpg')return item?
items
class DouyuItem(scrapy.Item):# define the fields for your item here like:imagelink = scrapy.Field()nickname = scrapy.Field()?
setting
USER_AGENT = 'Mozilla/5.0?(Linux;?U;?Android?4.4.2;?zh-cn;?PE-TL20?Build/HuaweiPE-TL20)?' \'AppleWebKit/537.36?(KHTML,?like?Gecko)Version/4.0?MQQBrowser/5.3?Mobile?Safari/537.36' robots需要修改成False這樣爬到的圖片就自動重命名后并且下載好了
git地址
總結
以上是生活随笔為你收集整理的scrapy爬取斗鱼图片并且重命名后保存的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: EF和Dapper之争的关键
- 下一篇: 字符集编码(四):UTF
