生活随笔
收集整理的這篇文章主要介紹了
抓取L4d2地图信息
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
程序用途:從 www.orangetage.com/map/ 獲取指定頁數(shù)的所有地圖的信息 儲存到txt文件
效果演示
from logging
import basicConfig
,DEBUG
,debug
,CRITICAL
,disable
basicConfig
(level
=DEBUG
, format='%(levelname)s: %(message)s. [%(lineno)d]%(filename)s <%(asctime)s>',filename
='debug.log',filemode
='w')import requests
,bs4
def get_html(page
):pages_list
=[]if '-' in page
:pages_list
.extend
(page
.split
('-'))for n
in range(int(pages_list
[0])+1,int(pages_list
[-1])):pages_list
.insert
(1,n
)else:pages_list
.append
(page
)if pages_list
[0]=='1' and len(pages_list
)==1:pages_list
[0]=requests
.get
('http://www.orangetage.com/map/index.html')elif pages_list
[0]=='1' and len(pages_list
)!=1:pages_list
[0]=requests
.get
('http://www.orangetage.com/map/index.html')for i
,e
in enumerate(pages_list
[1:],1):pages_list
[i
]=requests
.get
('http://www.orangetage.com/map/{}.html'.format(e
))elif pages_list
[0]!='1' and len(pages_list
)==1:pages_list
[0]=requests
.get
('http://www.orangetage.com/map/{}.html'.format(pages_list
[0]))elif pages_list
[0]!='1' and len(pages_list
)!=1:for i
,e
in enumerate(pages_list
,0):pages_list
[i
]=requests
.get
('http://www.orangetage.com/map/{}.html'.format(e
))for i
,n
in enumerate(pages_list
):pages_list
[i
].encoding
='gbk'pages_list
[i
]=pages_list
[i
].text
return pages_list
def get_map(page
):pages_list
=get_html
(page
)map_url_list
=[]for i
,n
in enumerate(pages_list
):pages_list
[i
]=bs4
.BeautifulSoup
(n
,'lxml').select
('div[class="list_img"] > a')for x
,url_tag
in enumerate(pages_list
[i
]):map_url_list
.append
(url_tag
.get
('href'))page_count
=1map_count
=1map_list
=['' for map_i
,n
in enumerate(map_url_list
)]for map_i
,n
in enumerate(map_url_list
):map_url_list
[map_i
]=requests
.get
(n
)map_url_list
[map_i
].encoding
='gbk'map_url_list
[map_i
]=map_url_list
[map_i
].textmap_list
[map_i
]+='------第{}頁-第{}個地圖------\n'.format(page_count
,map_count
)map_count
+=1if map_count
==9:map_count
=1page_count
+=1map_info_list
=bs4
.BeautifulSoup
(map_url_list
[map_i
],'lxml').select
('span[style="font-family:微軟雅黑;"]')if len(map_info_list
)!=1:map_list
[map_i
]+='地圖簡介:'for info_i
,info_tag
in enumerate(map_info_list
):map_info_list
[info_i
]=info_tag
.text
.replace
('\r\n','\n')map_list
[map_i
]+=map_info_list
[info_i
]+'\n'map_link_list
=bs4
.BeautifulSoup
(map_url_list
[map_i
],'lxml').select
('ul[class="l xz_a wrap blue"] > li > a')for link_i
,url_tag
in enumerate(map_link_list
):map_list
[map_i
]+=url_tag
.text
+':'map_list
[map_i
]+=url_tag
.get
('href')+'\n'map_list
[map_i
]+='\n\n'with open('l4d2_maps_info.txt','w',encoding
='utf-8') as f
:f
.write
('在第{}頁一共找到{}個地圖\n\n\n'.format(page
,len(map_url_list
)))for map_i
,n
in enumerate(map_url_list
):f
.write
(map_list
[map_i
])get_map
('18-25')
總結(jié)
以上是生活随笔為你收集整理的抓取L4d2地图信息的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。