爬取斗鱼图片
創(chuàng)建項目
scrapy startproject douyu
編寫items.py
1 import scrapy 2 3 class DouyuItem(scrapy.Item): 4 nickname = scrapy.Field() 5 imagelink = scrapy.Field() 6 imagePath = scrapy.Field()創(chuàng)建基礎(chǔ)類的爬蟲
scrapy genspider douyutupian?capi.douyucdn.cn
?
手機抓包得到API接口,返回JSON格式數(shù)據(jù)
douyutupian.py
1 import scrapy 2 from douyu.items import DouyuItem 3 import json 4 5 6 class DouyumeinvSpider(scrapy.Spider): 7 name = "douyutupian" 8 allowed_domains = ["capi.douyucdn.cn"] 9 10 offset = 0 11 url = "http://capi.douyucdn.cn/api/v1/getVerticalRoom?limit=20&offset=" 12 13 start_urls = [url + str(offset)] 14 15 def parse(self, response): 16 # 把json格式的數(shù)據(jù)轉(zhuǎn)換為python格式,data段是列表 17 data = json.loads(response.text)["data"] 18 for each in data: 19 item = DouyuItem() 20 item["nickname"] = each["nickname"] 21 item["imagelink"] = each["vertical_src"] 22 23 yield item 24 25 self.offset += 20 26 yield scrapy.Request(self.url + str(self.offset), callback = self.parse)管道文件
pipelines.py
?
?
settings.py 1 BOT_NAME = 'douyu' 2 3 SPIDER_MODULES = ['douyu.spiders'] 4 NEWSPIDER_MODULE = 'douyu.spiders' 5 6 DEFAULT_REQUEST_HEADERS = { 7 "User-Agent" : "DYZB/1 CFNetwork/808.2.16 Darwin/16.3.0" 8 } 9 10 ITEM_PIPELINES = { 11 'douyu.pipelines.ImagesPipeline': 300, 12 } 13 14 IMAGES_STORE = "IMAGES_STORE = "../../Images"?
轉(zhuǎn)載于:https://www.cnblogs.com/wanglinjie/p/9240373.html
總結(jié)
 
                            
                        - 上一篇: html设置button水平居中,htm
- 下一篇: 专升本英语——应试题型突破——完形填空—
