生活随笔
收集整理的這篇文章主要介紹了
爬虫程序,从图片网站或者贴吧爬取图片(含代码)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
github地址:https://github.com/531126085/Web-spider
download——mm是從煎蛋網上批量下載圖片到自己新建的一個xxoo的文件夾下
import urllib.request
import osdef url_open(url):req = urllib.request.Request(url)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36')response = urllib.request.urlopen(url)html = response.read()return htmldef get_page(url):html = url_open(url).decode('utf-8')a = html.find('current-comment-page') + 23b = html.find(']',a)return html[a:b]def find_imgs(url):html = url_open(url).decode('utf-8') img_addrs = []a = html.find('img src=')while a!=-1:b = html.find('.jpg',a, a+255)if b != -1:img_addrs.append('http:'+html[a+9:b+4])else:b = a+9a = html.find('img src=',b)return img_addrsdef save_imgs(folder,img_addrs):for each in img_addrs:filename = each.split('/')[-1]with open(filename,'wb') as f:img = url_open(each)f.write(img)def download_mm(folder='ooxx',pages=10):os.mkdir(folder)os.chdir(folder)url = "http://jandan.net/ooxx/"page_num =int(get_page(url))for i in range(pages):page_num -= ipage_url = url + 'page-' + str(page_num) + '#comments'img_addrs = find_imgs(page_url)save_imgs(folder,img_addrs)if __name__=='__main__':download_mm()
download——quanyou是從權力的游戲貼吧上下載圖片到當前的工作目錄
import urllib.request
import redef open_url(url):req = urllib.request.Request(url)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36')page = urllib.request.urlopen(req)html = page.read().decode('utf-8')return htmldef get_img(html):p = r'<img class="BDE_Image" src="([^"]+\.jpg)"'#采用正則表達式查找圖片的地址imglist = re.findall(p,html) #findall函數,如果正則表達式里有帶()的,則查找到之后會將括號的里面的信息返回給imglistfor each in imglist:print(each)for each in imglist:filename = each.split("/")[-1]urllib.request.urlretrieve(each,filename,None)if __name__=='__main__':url = "http://tieba.baidu.com/p/6093575289?pid=125013245611&cid=0#125013245611"get_img(open_url(url))
總結
以上是生活随笔為你收集整理的爬虫程序,从图片网站或者贴吧爬取图片(含代码)的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。