生活随笔
收集整理的這篇文章主要介紹了
                                
使用爬虫框架scrapy爬取LOL英雄数据
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
 
                                
                            
                            
                            Scrapy框架實戰
 
爬取目標:英雄聯盟所有英雄的基本信息(名字,背景故事,技能名稱及介紹)、下載所有英雄的皮膚并保存至本地
 
首先來到LOL官網首頁,如圖進入所有英雄的信息頁面
 
 
先說一下我最開始的思路:
 
通過網頁源代碼來獲取想要的數據,這也是最基本的爬取數據的方式
 
通過單個英雄信息的url不難發現規律,每個英雄的詳情頁url地址都一樣,只是參數id的值不一樣。
 
 
 
那么便可以通過在英雄信息頁獲取到每個英雄的id從而得到詳情頁地址
 
想象是美好的,實際操作時一直都獲取不到想要的數據,獲取的li標簽中的值一直是“正在加載中”
 
最后才發現這些英雄的數據都是用過ajax請求來獲取數據的,用傳統的方式肯定不行
 
 
然后我換了一種思路
 
直接獲取存儲英雄信息的js文件,通過js文件來獲得每一個英雄的id,然后通過拼接url來得到英雄詳情頁的地址
 
 
英雄詳情頁一樣是通過ajax獲取數據
 
 
獲取的js文件中有我們想要的數據
 
英雄信息、皮膚圖片地址可以直接獲取
 
 
爬取代碼:
 lolheros_info.py
 
# -*- 
coding: utf-8 -*-
import scrapy
import json
from lolheros.items import LolherosItemclass 
LolherosInfoSpider(scrapy.Spider
):name = 
'lolheros_info'allowed_domains = [
'lol.qq.com']start_urls = [
'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js']def parse_
heroinfo(self
,response
):datas = json.
loads(response.body
)hero_info = datas[
'hero']hero_nickname = hero_info[
'name']hero_realname = hero_info[
'title']hero_background = hero_info[
'shortBio']hero_skins = datas[
'skins']hero_skin_urls = []for hero_skin in 
hero_skins:hero_skin_url = hero_skin[
'mainImg']hero_skin_urls.
append(hero_skin_url
)hero_skills = datas[
'spells']hero_skills_str = 
""for hero_skill in 
hero_skills:hero_skills_str += 
"("+
str(hero_skill[
'name']
)+
":"+
str(hero_skill[
'description']
).
replace('<br>','')+
")"hero_info_list = [hero_nickname
,hero_realname
,hero_background
,hero_skills_str]item = 
LolherosItem(hero_info_list=hero_info_list
,hero_skin_urls=hero_skin_urls
)yield itemdef 
parse(self
, response
):datas = json.
loads(response.body
)heros_list = datas[
'hero']for hero_info in 
heros_list:hero_id = hero_info[
'heroId']heroinfo_url = 
"https://game.gtimg.cn/images/lol/act/img/js/hero/"+hero_id+
".js"request = scrapy.
Request(heroinfo_url
,callback=self.parse_heroinfo
,dont_filter=True
)yield request
 
數據處理代碼:
 pipelines.py
 
# -*- 
coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# 
See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
import xlwt
from urllib import request
import osclass 
LolherosPipeline(object
):current_row = 1savepath = 
"LOL英雄信息.xls"book = xlwt.
Workbook(encoding=
"utf-8", style_compression=0
)sheet = book.add_
sheet('LOL英雄信息', cell_overwrite_ok=True
)def __init__
(self
):passdef open_
spider(self
,spider
):print("爬取數據開始")self.image_path = os.path.
join(os.path.
dirname(os.path.
dirname(__file__
)),"images")if not os.path.
exists(self.image_path
):os.
mkdir(self.image_path
)def process_
item(self
, item
, spider
):hero_skin_urls = item[
'hero_skin_urls']hero_info_list = item[
'hero_info_list']
print(hero_skin_urls
)#將英雄數據保存到excelcol = 
("昵稱","名字","背景故事","技能介紹")for i in 
range(0
,4
):self.sheet.
write(0
,i
,col[i]
)for i in 
range(0
,4
):self.sheet.
write(self.current_row
,i
,hero_info_list[i]
)self.current_row += 1self.book.
save(self.savepath
)# 下載英雄皮膚hero_name = hero_info_list[0]# 創建 英雄名的文件夾image_category = os.path.
join(self.image_path
,hero_name
)if not os.path.
exists(image_category
):os.
mkdir(image_category
)for hero_skin_url in 
hero_skin_urls:if hero_skin_url != 
'':image_name = hero_skin_url.
split('/')[-1]request.
urlretrieve(hero_skin_url
,os.path.
join(image_category
,image_name
))return itemdef close_
spider(self
,spider
):print("爬取數據結束")
 
爬取結果:
 
所有英雄的基本信息(保存至excel)
 
 
所有英雄的皮膚圖片
 
                            總結
                            
                                以上是生活随笔為你收集整理的使用爬虫框架scrapy爬取LOL英雄数据的全部內容,希望文章能夠幫你解決所遇到的問題。
                            
                            
                                如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。