python下载小说
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                python下载小说
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
                                我以前喜歡看小說,玄幻,武俠,修真是我的最愛,剛學python不久,出于我對小說的熱愛,我寫了個腳本用來下載我經常看的筆趣閣網站首頁所有的小說
首先得到網站首頁的html,分析首頁中有多少小說,循環得到小說的html,提取出小說名字,創建小說名字為名的文本,將每一章節的章節名和內容提取出來寫入到文本中,循環直到最后一個章節,然后開始下一本小說
我這里是先把html下載到本地G:\url\中,然后讀取的,其實直接打開url也可以。之前運行時有時候會卡在某個地方,得不到某個網頁,我以為是緩存的問題,其實不是。解決方法是我設置了五秒中的超時和異常處理。如果五秒鐘得不到這一章節的頁面那么就跳過下載下一章
# -*- coding: utf-8 -*- # ------------------------------------------- # 下載http://www.biquge.la筆趣閣首頁上顯示的所有小說 # 下載的小說存放在G:\txt文件夾下 # ------------------------------------------- # 2014/8/23 # wyp # -------------------------------------------import re import urllib import os import socketdef getHtml(url):reg = r'http:\.\.(.*)'res = re.compile(reg)urlstr = url.replace('/', '.')print urlstrname = re.findall(res, urlstr)urlpathname = r'G:\url' +'\\' + name[0]print 'urlpathname = '+urlpathname try:socket.setdefaulttimeout(5.0)urllib.urlretrieve(url, urlpathname)except:passprint 'getHtml ---------------over'return urlpathnamedef getBook(html):reg = r'<a href="/book/(.*?)/'res = re.compile(reg)Book = re.findall(res, html)return Bookdef getName(html):reg = r'<h1>(.*?)</h1>'res = re.compile(reg)name = re.findall(res, html)return namedef getZhangJie(html):reg = r'<dd><a href="(.*?)">(.*?)</a>'res = re.compile(reg)zhangJie = re.findall(res, html)return zhangJiedef getContent(html):reg = r'<div id="content">(.*?)</div>'res = re.compile(reg)content = re.findall(res, html)return contentif __name__ == "__main__":url = raw_input("please input url: ")urlpathname = getHtml(url)print urlpathnamef1 = open(urlpathname, 'rb+')html = f1.read()print htmlBook = getBook(html)#去重保持元素順序book = list(set(Book))book.sort(key=Book.index)for b in book:realurl = url + '/book/' + b + '/'print realurlrealurlname = getHtml(realurl)print realurlnamef2 = open(realurlname, 'rb+')realhtml = f2.read()BookName = getName(realhtml)filepath = os.path.join(r"G:\txt", BookName[0])filename = filepath + '.txt'print filenameif os.path.exists(filename):continuefd = open(filename, 'w+')zhangjie = getZhangJie(realhtml)for zj in zhangjie:sonurl = realurl + zj[0]print "url = %s" % sonurltry:sonurlname = getHtml(sonurl)print '-----'+sonurlnameexcept: continuetry:f3 = open(sonurlname, 'rb+')except IOError:continuesonhtml = f3.read()zhangjieming = getName(sonhtml)if len(zhangjieming) == 0:continuefd.write('\t\t\t\t\t' + zhangjieming[0] + '\r\n')print "downding " + zhangjieming[0]fd.write('\r\n')fd.flush()try:content = getContent(sonhtml)except:passif len(content) == 0:continuec1 = content[0].replace('<br />', '')c2 = c1.replace('?', ' ')fd.write(c2)fd.write('\r\n\r\n\r\n\r\n')fd.flush()fd.close()f2.close()f3.close()
總結
以上是生活随笔為你收集整理的python下载小说的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: Quora的技术探索
- 下一篇: Nicolas Bourbaki 是何许
