利用Python获取某游戏网站热销商品并用pands进行Excel数据存储
??????? 因為要求,這個不知名的網站用S代替了。
????????有剛剛使用S的用戶,不知道玩什么游戲怎么辦?往往熱銷商品會使他們最合適的選擇。
??????? 當然,某個第三方的網站上面的數據會更詳細,什么游戲用戶活躍度高,哪個區服游戲價格更便宜上面都會有。但是加上了一層Cloudflare的瀏覽器驗證。
????????有人說用cloudscraper,但是cloudscraper對商用版的Cloudflare好像不管用(應該是吧,如果有大佬有更好的方法請及時指出,謝謝),之后會用其他的方法再試試。所以這邊先按下不表,開始獲取S的熱銷信息。
一、熱銷獲取分析
?????????點擊進入熱銷商品頁:
https://那個網站/search/?sort_by=_ASC&force_infinite=1&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&page=2&os=win??????? 上面的鏈接,僅僅能獲取第一頁的數據。
????????通過開發者模式找到真正的內容獲取鏈接是:
https://那個網站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1????????其中start對應了開始位置,對應了翻頁。count對應了一次獲取了多少數據。
????????get請求即可,上代碼:
def getInfo(self):url = 'https://那個網站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:gamedata = {}gamedata['url'] = node.css('a::attr(href)').extract_first()#鏈接gamedata['name'] = node.css('a .search_name .title::text').extract_first()#游戲名gamedata['sales_date'] = node.css('a .search_released::text').extract_first()#發售日discount = node.css('.search_discount span::text').extract_first()#是否打折gamedata['discount'] = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()#價格discountPrice = node.css('.discounted::text').extract()#打折后的價格discountPrice = discountPrice[-1] if discountPrice else ''gamedata['price'] = discountPrice if discountPrice else price#最終價格print(gamedata)二、pandas保存數據?
2.1 構建pandas DataFrame對象
????????pandas存儲Excel數據利用的是pandas對象的to_excel方法,將pandas的Dataframe對象直接插入Excel表中。
????????而DataFrame表示的是矩陣的數據表,包含已排序的列集合。
????????首先,先將獲取到的數據,構建成Dataframe對象,先將我們獲取的數據分別存入對應的list中,獲取的url存到url的list,游戲名存到name的list:
url = []name = []sales_date = []discount = []price = [] url = node.css('a::attr(href)').extract_first() if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price) else:print('已存在')????????將list組成相應的字典
data = {'URL':self.url,'游戲名':self.name,'發售日':self.sales_date,'是否打折':self.discount,'價格':self.price}????????其中dict中的key值對應的是Excel的列名。之后用pandas的DataFrame()方法構建對象,之后插入Excel文件。
data = {'URL':self.url,'游戲名':self.name,'發售日':self.sales_date,'是否打折':self.discount,'價格':self.price} frame = pd.DataFrame(data) xlsxFrame = pd.read_excel('./steam.xlsx')????????其中pd是引入pandas包的對象,約定俗成的見到pd就是引入了pandas。
import pandas as pd2.2 pandas追加插入Excel
????????如果要是翻頁的話,重復調用插入Excel方法時你會發現Excel表內的數據并不會增多,因為每一次to_excel()方法都會把你上一次寫入的數據覆蓋掉。
????????所以若想保留之前寫入的數據,那就先把之前寫入的數據讀出來,然后和新產生的數據進行DaraFrame對象的合并,將總的數據再次寫入Excel
frame = frame.append(xlsxFrame)????????寫入方法如下:
def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)邏輯:
三、代碼整合
import requests from scrapy import Selector import pandas as pdclass getSteamInfo():headers = {"Host": "那個網站","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36",}url = []name = []sales_date = []discount = []price = []# api獲取ipdef getApiIp(self):# 獲取且僅獲取一個ipapi_url = 'api地址'res = requests.get(api_url, timeout=5)try:if res.status_code == 200:api_data = res.json()['data'][0]proxies = {'http': 'http://{}:{}'.format(api_data['ip'], api_data['port']),'https': 'http://{}:{}'.format(api_data['ip'], api_data['port']),}print(proxies)return proxieselse:print('獲取失敗')except:print('獲取失敗')def getInfo(self):url = 'https://那個網站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:url = node.css('a::attr(href)').extract_first()if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price)else:print('已存在')# self.insert_info()def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)# 專門發送請求的方法,代理請求三次,三次失敗返回錯誤def getRes(self,url, headers, proxies, post_data, method):if proxies:for i in range(3):try:# 傳代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 傳代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f'第{i+1}次請求出錯')else:return Noneelse:for i in range(3):proxies = self.getApiIp()try:# 請求代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 請求代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f"第{i+1}次請求出錯")else:return Noneif __name__ == '__main__':getSteamInfo().getInfo()????????對了,本次數據是獲取的美服數據哦。最近國內訪問不穩定,若是想要獲取數據不買游戲的話建議使用代理進行訪問。我這里使用的時ipidea的代理,新用戶可以白嫖流量哦。
? ? ? ? 地址:http://www.ipidea.net/?utm-source=csdn&utm-keyword=?wb?
????????最后奉勸大家:適當游戲,理智消費 ,認真生活,支持正版。(大批量的數據還是存數據庫吧,人家也支持導出Excel)
總結
以上是生活随笔為你收集整理的利用Python获取某游戏网站热销商品并用pands进行Excel数据存储的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【IOT】智能楼宇 - HVAC 暖通技
- 下一篇: 欧拉角,万向节锁和四元数