爬取热门网站的热榜,集中展示
生活随笔
收集整理的這篇文章主要介紹了
爬取热门网站的热榜,集中展示
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
爬取熱門網(wǎng)站的熱榜,集中展示;Integrate and display hot billboard or ranked topic from hot Chinese websites
抓取知乎熱榜數(shù)據(jù)存入列表
#!/usr/bin/env python # encoding: utf-8__author__ = 'HZT'import requests import re from bs4 import BeautifulSoupheaders={"User-Agent":"","Cookie":""} zh_url = "https://www.zhihu.com/billboard" zh_response = requests.get(zh_url,headers=headers)webcontent = zh_response.text soup = BeautifulSoup(webcontent,"html.parser") script_text = soup.find("script",id="js-initialData").get_text() rule = r'"hotList":(.*?),"guestFeeds"' result = re.findall(rule,script_text)temp = result[0].replace("false","False").replace("true","True") hot_list = eval(temp) print(hot_list)抓取微博熱門數(shù)據(jù)存入列表
#!/usr/bin/env python # encoding: utf-8import requests from bs4 import BeautifulSoupurl = "https://s.weibo.com/top/summary" headers = {"User-Agent": "", "Cookie": ""} wb_response = requests.get(url, headers=headers) webcontent = wb_response.text soup = BeautifulSoup(webcontent, "html.parser") index_list = soup.find_all("td", class_="td-01") title_list = soup.find_all("td", class_="td-02") level_list = soup.find_all("td", class_="td-03")topic_list = [] for i in range(len(index_list)):item_index = index_list[i].get_text(strip=True)if item_index == "":item_index = "0"item_title = title_list[i].a.get_text(strip=True)if title_list[i].span:item_mark = title_list[i].span.get_text(strip=True)else:item_mark = "置頂"item_level = level_list[i].get_text(strip=True)topic_list.append({"index": item_index, "title": item_title, "mark": item_mark, "level": item_level,"link": f"https://s.weibo.com/weibo?q=%23{item_title}%23&Refer=top"}) print(topic_list) 創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯,堅持創(chuàng)作打卡瓜分現(xiàn)金大獎總結(jié)
以上是生活随笔為你收集整理的爬取热门网站的热榜,集中展示的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 机器学习十大经典算法之岭回归和LASSO
- 下一篇: 领域应用 | 知识图谱数据构建的“硬骨头