當前位置：首頁 > 编程语言 > python >内容正文

python

Python爬取曾今的K歌

發布時間：2023/12/20 python 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python爬取曾今的K歌小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

還記得我們童年唱過的歌嗎，還記得曾今喜歡的人的聲音嗎，全民K歌作為曾今主流的唱歌軟件，深受我的想念。每次聯網去訪問，萬一哪天對方把歌刪了，或變成私密了，那就可惜了。今天我在此制作一款全民K歌下載器，讓你留住你與別人的曾今！【本程序融合了我國慶節，企查查我來啦~這篇文章里的get_cookies.py文件代碼，可以自動獲取cookie，不過實測自動獲取的cookie會比正常訪問的cookie少一個參數，在獲取單曲數量上可能會缺失（自己實測某全民K歌下載143首只下載了140首），直接復制cookie則沒有這個問題。】

思路

目前全民K歌只顯示前8首歌曲，而且底部的查看更多是沒有用的，我一開始看到這現象是對爬這個內心感到絕望的，當然如果愿意一個個分享出來提取鏈接下載那也是可以的，但是人肯定不想那么麻煩，只想獲取到一個人就把這個人所有的歌曲都下載了。經過我對網頁的研究，使用BeautifulSoup等網頁解析工具后發現個人界面上有個script標簽里存放當前賬號所能獲取到的概覽信息，有用的主要就是總歌曲數，如果是爬別人的，只會顯示非私密歌曲數，這能讓我們知道，我們要爬多少，有這個好頭，我想之后處理起來應該好一點。
在我多次刷新觀察XHR后，我靈機一動點了一次查看更多，發現他雖然沒有用，但多出來了一個XHR請求 kg_ugc_get_homepage，雖然我看到他并沒有返回有用的數據，但我當我看到他所需的熟悉的參數，我就肯定這是獲取歌曲的唯一方式！

這可把我激動壞了，拼接上start、num、share_uid后，果然，成功的響應了一個有歌曲信息的callback對象，經過簡單分析后，得到了ugclist歌曲列表。經過多次試探，最多只能15首歌曲，罷了，多請求幾次就多請求幾次吧，反正總歌曲數據 get 到了。
我本以為接下來會很輕松，沒想到音頻鏈接很好找，不過更沒想到…

到這里我要吐了，都是不知道從哪里冒出來的參數，要找到未免有點太復雜了吧…我感覺，至少有十幾個甚至幾十個JavaScript函數參與了這些參數的生成，更可能是在服務器端生成的，我果斷放棄了這條路。
我看了下網頁源代碼，嘿，歌曲url在網頁源代碼的某個script標簽中，果然我被幸運女神寵幸了兩次。
接下來使用BeautifulSoup等網頁解析工具就能解析出歌曲地址，這樣就直接跳過了ftnrkey、vkey、fname、ugcid的獲取，嘿嘿。然后就可以正常的下載了。
下載完歌曲，我看了看我的全民K歌，好似還有一個專輯沒有下載，我心想歌曲都下了，專輯也不能落下啊，我直接打開專輯標簽，出現了一個fcg_user_album_list的XHR請求，看英文就知道是獲取專輯列表，我就一個專輯所以顯示一個。專輯詳情界面實際參數挺少的，只要一個專輯id參數s，就能訪問專輯界面，我現在對界面中的XHR絕望了，我毅然決然的再次分析網頁源代碼，發現我想要的信息依舊靜靜的躺在script標簽里，不過我明明有11首單曲他只能獲得10首，不知道是因為我其中一首刪除了還是因為他只能獲取前10首，就不管它了，專輯就差不多就完事了，如果你們有專輯內歌曲多的，可以試試獲取幾條~
接下來下載專輯歌曲和下載普通歌曲一樣，獲取到shareid就能進入歌曲詳情頁，分析出歌曲網址就能把他們下載下來了~

代碼

# _*_ coding:utf-8 _*_ # Project: # FileName: qmkg_new.py # UserName: 高俊佶 # ComputerUser：19305 # Day: 2021/10/24 # Time: 12:00 # IDE: PyCharm # 女人，不要也罷！——來自2021-10-9日的靈魂傷感import os import sys import json import time import base64 import getpass import sqlite3import urllib3 import requests import webbrowser import ctypes.wintypes from bs4 import BeautifulSoup from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modesurllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)class DataBlob(ctypes.Structure):_fields_ = [('cbData', ctypes.wintypes.DWORD), ('pbData', ctypes.POINTER(ctypes.c_char))]def dp_api_decrypt(encrypted):p = ctypes.create_string_buffer(encrypted, len(encrypted))blob_out = DataBlob()ret_val = ctypes.windll.crypt32.CryptUnprotectData(ctypes.byref(DataBlob(ctypes.sizeof(p), p)), None, None, None, None, 0, ctypes.byref(blob_out))if not ret_val:raise ctypes.WinError()result = ctypes.string_at(blob_out.pbData, blob_out.cbData)ctypes.windll.kernel32.LocalFree(blob_out.pbData)return resultdef aes_decrypt(encrypted_txt):with open(f'C:\\Users\\{getpass.getuser()}\\AppData\\Local\\Google\\Chrome\\User Data\\Local State', encoding='utf-8', mode="r") as f:jsn = json.loads(str(f.readline()))encrypted_key = base64.b64decode(jsn["os_crypt"]["encrypted_key"].encode())encrypted_key = encrypted_key[5:]cipher = Cipher(algorithms.AES(dp_api_decrypt(encrypted_key)), None, backend=default_backend())cipher.mode = modes.GCM(encrypted_txt[3:15], tag=None, min_tag_length=16)return cipher.decryptor().update(encrypted_txt[15:])def chrome_decrypt(encrypted_txt):if sys.platform == 'win32':try:if encrypted_txt[:4] == b'x01x00x00x00':return dp_api_decrypt(encrypted_txt).decode()elif encrypted_txt[:3] == b'v10':return aes_decrypt(encrypted_txt)[:-16].decode()except WindowsError:return Noneelse:raise WindowsErrordef get_cookies_from_chrome(d):con = sqlite3.connect(f'C:\\Users\\{getpass.getuser()}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Cookies')con.row_factory = sqlite3.Rowcur = con.cursor()cur.execute(f'SELECT name, encrypted_value as value FROM cookies where host_key like "%{d}%"')cookies = ''for row in cur:if row['value'] is not None:value = chrome_decrypt(row['value'])if value is not None:cookies += row['name'] + '=' + value + ';'return cookiesdef parse_cookies(cookies: str):cookies_dict = {}for c in cookies.replace(' ', '').split(';'):try:cookies_dict[c.split('=')[0]] = c.split('=')[1]except IndexError:cookies_dict[c.split('=')[0]] = ''if "" in cookies_dict:del cookies_dict[""]return cookies_dictwebbrowser.open('https://kg.qq.com/index-pc.html') cookie = input('輸入有效的cookie（XHR中選一），否則讀取谷歌瀏覽器cookie（可能會缺失歌曲）：') if not cookie:cookie = get_cookies_from_chrome('qq.com') + get_cookies_from_chrome('kg.qq.com')if not parse_cookies(cookie).get('muid', None):print('等待登錄全民K歌網頁...')n = 1while not parse_cookies(cookie).get('muid', None):cookie = get_cookies_from_chrome('qq.com') + get_cookies_from_chrome('kg.qq.com')print(f'檢測登陸狀態第【{n}】次...', end='\r')time.sleep(1)n += 1 uid = parse_cookies(cookie)['muid'] print(f'\n獲取到用戶uid：{uid}') inp = input('需要查詢的uid，否則獲取用戶自身：') if len(inp) > 10:uid = inp# 獲取所有能得到的歌曲信息 total = 0 # 可以獲取到的歌曲總數 ugc = [] # 全部歌曲數據 user_information = {} # 用戶基本信息 res = requests.get(f'https://kg.qq.com/node/personal?uid={uid}', cookies={"cookie": cookie}) if res.ok:for script in BeautifulSoup(res.text, 'lxml').find_all('script'):if "window.__DATA__" in script.text:user_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["data"]total = user_information["ugc_total_count"] # 沒有cookies ==公開的歌曲 | 有cookies ==賬戶所有的歌曲 || 能夠被獲取到的歌曲數目print(f'總共歌曲數目：{total}')if not os.path.exists(f'{user_information["kgnick"]}_{uid}/media'):os.makedirs(f'{user_information["kgnick"]}_{uid}/media')num = 15 # 單次獲取最大15首n = 1 # 頁數while n:url = f'http://node.kg.qq.com/cgi/fcgi-bin/kg_ugc_get_homepage?type=get_uinfo&start={n}&num={num}&share_uid={uid}'res = requests.get(url, cookies={"cookie": cookie})if res.ok:song_information = json.loads(res.text[res.text.find('{'): res.text.rfind('}') + 1])["data"]if not song_information["ugclist"]:breakugc += song_information["ugclist"]n += 1breakelse:print('未發現歌曲！')if user_information:open(f'{user_information["kgnick"]}_{uid}/{user_information["kgnick"]}_{uid}.json', 'w', encoding='utf-8').write(json.dumps(ugc, indent=4, ensure_ascii=False))for i, song in enumerate(ugc):# 直接從字典獲取歌曲鏈接（跳過 vkey 的麻煩獲取）res = requests.get(f'https://node.kg.qq.com/play?s={song["shareid"]}', cookies={"cookie": cookie})if res.ok:for script in BeautifulSoup(res.text, 'lxml').find_all('script'):if "window.__DATA__" in script.text:media_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]res = requests.get(media_information["playurl"], stream=True)if res.ok:print(f'\r正在下載：{user_information["kgnick"]}_{uid}/media/{song["title"]}_{song["shareid"]}.m4a\n【當前：{str(i + 1).zfill(len(str(total)))}/總共：{total}】', end='')open(f'{user_information["kgnick"]}_{uid}/media/{song["title"]}_{song["shareid"]}.m4a', 'wb').write(res.content)breakelse:print('未發現媒體鏈接！')print()# 獲取專輯 album_list = {} res = requests.get(f'https://node.kg.qq.com/cgi/fcgi-bin/fcg_user_album_list?dest_uid={uid}', cookies={"cookie": cookie}) if res.ok:album_information = json.loads(res.text[res.text.find('{'): res.text.rfind('}') + 1])["data"]if "album_list" in album_information and album_information["album_list"]:for album in album_information["album_list"]:album_list[album["album_id"]] = {"album_name": album["album_name"], "album_list": []}res = requests.get(f'https://node.kg.qq.com/album?s={album["album_id"]}')if res.ok:for script in BeautifulSoup(res.text, 'lxml').find_all('script'):if "window.__DATA__" in script.text:album_list_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]if album_list_information["ugc_list"] and album["ugc_num"] and len(album_list_information["ugc_list"]) == album["ugc_num"]:if not os.path.exists(f'{user_information["kgnick"]}_{uid}/{album["album_name"]}_{album["album_id"]}'):os.makedirs(f'{user_information["kgnick"]}_{uid}/{album["album_name"]}_{album["album_id"]}')if album["ugc_num"] != album_list_information["ugc_num"]:print('獲取專輯歌曲暫時只能10首，超出部分無法獲得')album_list[album["album_id"]]["album_list"] = album_list_information["ugc_list"]else:print('無專輯歌曲或未獲取全部專輯歌曲')breakelse:print('專輯未發現歌曲！')# 下載專輯 if album_list:open(f'{user_information["kgnick"]}_{uid}/album_list.json', 'w', encoding='utf-8').write(json.dumps(album_list, indent=4, ensure_ascii=False))for album in album_list:if album_list[album]:total = len(album_list[album]["album_list"])open(f'{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{album_list[album]["album_name"]}_{album}.json', 'w', encoding='utf-8').write(json.dumps(album_list[album], indent=4, ensure_ascii=False))for i, song in enumerate(album_list[album]["album_list"]):# 直接從字典獲取歌曲鏈接（跳過 vkey 的麻煩獲取）res = requests.get(f'https://node.kg.qq.com/play?s={song["ugc_id"]}', cookies={"cookie": cookie})if res.ok:for script in BeautifulSoup(res.text, 'lxml').find_all('script'):if "window.__DATA__" in script.text:media_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]res = requests.get(media_information["playurl"], stream=True)if res.ok:print(f'\r正在下載：{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{song["song_name"]}_{song["ugc_id"]}.m4a\n【當前：{str(i + 1).zfill(len(str(total)))}/總共：{total}】', end='')open(f'{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{song["song_name"]}_{song["ugc_id"]}.m4a', 'wb').write(res.content)breakelse:print('未發現媒體鏈接！')print() input('下載完成！回車結束程序~')

結束語

本程序可以打包成程序：pyinstaller -F qmkg_new.py 實現走到哪用到哪，在全民K歌上唱歌好聽的小姐姐們可以下載后分享給我哦~【/手動滑稽/】

總結

以上是生活随笔為你收集整理的Python爬取曾今的K歌的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：轻触开关的全球与中国市场2022-202
下一篇：计算机桌面输入法没有了怎么办,输入法没了

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

Python爬取曾今的K歌

前言

思路

代碼

結束語

總結