python爬取起点中文网小说
生活随笔
收集整理的這篇文章主要介紹了
python爬取起点中文网小说
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
python爬取起點中文網小說
完整代碼:
import requests from lxml import etree header = {'User-Agent':'Mozilla/5.0(Macintosh;Inter Mac OS X 10_13_3) AppleWebkit/537.36 (KHTML,like Gecko)''Chrom/65.0.3325.162 Safari/537.36'} def getbookurls():url = 'https://book.qidian.com/info/1017125042#Catalog'#獲取頁面源代碼charptes = requests.get(url,headers = header).text#print(charptes)objects = etree.HTML(charptes)#print(objects)#章節鏈接 //匹配所有objs = objects.xpath('//ul[@class="cf"]/li')clist = []for obj in objs:try:#章節的url地址chapt_urls = obj.xpath('a/@href')[0]#章節的名稱chapt_names = obj.xpath('a/text()')[0]into = {'chapt_urls':'https:'+ chapt_urls,'chapt_names':chapt_names}clist.append(into)except:passreturn clistclist = getbookurls()#獲取章節小說內容 def getcontent(url):res = requests.get(url,headers = header).textobjects = etree.HTML(res)objs = objects.xpath('//div[@class="read-content j_readContent"]/p/text()')content = []for i in objs:# 替換之前的 替換之后的text = i.replace('\u3000\u3000','')content.append(text)return content#下載小說 for i in clist:chapt_urls = i['chapt_urls']chapt_names = i['chapt_names']content = getcontent(chapt_urls)text = ''for j in content:text = text + jprint("正在下載%s"%chapt_names)#保存路徑,按照自己的進行更改with open('起點小說/%s.doc'%chapt_names,'w') as f:f.write(text)總結
以上是生活随笔為你收集整理的python爬取起点中文网小说的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 颜色对照表
- 下一篇: AtCoder Grand Contes