當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）

發布時間：2025/5/22 编程问答 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

??本次Scrapy爬蟲的目標是爬取“融360”網站上所有銀行理財產品的信息，并存入MongoDB中。網頁的截圖如下，全部數據共12多萬條。

??我們不再過多介紹Scrapy的創建和運行，只給出相關的代碼。關于Scrapy的創建和運行，有興趣的讀者可以參考：Scrapy爬蟲（4）爬取豆瓣電影Top250圖片。
??修改items.py，代碼如下，用來儲存每個理財產品的相關信息，如產品名稱，發行銀行等。

import scrapy class BankItem(scrapy.Item):# define the fields for your item here like:name = scrapy.Field()bank = scrapy.Field()currency = scrapy.Field()startDate = scrapy.Field()endDate = scrapy.Field()period = scrapy.Field()proType = scrapy.Field()profit = scrapy.Field()amount = scrapy.Field()

??創建爬蟲文件bankSpider.py，代碼如下，用來爬取網頁中理財產品的具體信息。

import scrapy from bank.items import BankItemclass bankSpider(scrapy.Spider):name = 'bank'start_urls = ['https://www.rong360.com/licai-bank/list/p1']def parse(self, response):item = BankItem()trs = response.css('tr')[1:]for tr in trs:item['name'] = tr.xpath('td[1]/a/text()').extract_first()item['bank'] = tr.xpath('td[2]/p/text()').extract_first()item['currency'] = tr.xpath('td[3]/text()').extract_first()item['startDate'] = tr.xpath('td[4]/text()').extract_first()item['endDate'] = tr.xpath('td[5]/text()').extract_first()item['period'] = tr.xpath('td[6]/text()').extract_first()item['proType'] = tr.xpath('td[7]/text()').extract_first()item['profit'] = tr.xpath('td[8]/text()').extract_first()item['amount'] = tr.xpath('td[9]/text()').extract_first()yield itemnext_pages = response.css('a.next-page')if len(next_pages) == 1:next_page_link = next_pages.xpath('@href').extract_first() else:next_page_link = next_pages[1].xpath('@href').extract_first()if next_page_link:next_page = "https://www.rong360.com" + next_page_linkyield scrapy.Request(next_page, callback=self.parse)

??為了將爬取的數據儲存到MongoDB中，我們需要修改pipelines.py文件，代碼如下：

# pipelines to insert the data into mongodb import pymongo from scrapy.conf import settingsclass BankPipeline(object):def __init__(self):# connect databaseself.client = pymongo.MongoClient(host=settings['MONGO_HOST'], port=settings['MONGO_PORT'])# using name and password to login mongodb# self.client.admin.authenticate(settings['MINGO_USER'], settings['MONGO_PSW'])# handle of the database and collection of mongodbself.db = self.client[settings['MONGO_DB']]self.coll = self.db[settings['MONGO_COLL']] def process_item(self, item, spider):postItem = dict(item)self.coll.insert(postItem)return item

其中的MongoDB的相關參數，如MONGO_HOST, MONGO_PORT在settings.py中設置。修改settings.py如下：

ROBOTSTXT_OBEY = False

ITEM_PIPELINES = {‘bank.pipelines.BankPipeline’: 300}

添加MongoDB連接參數

MONGO_HOST = "localhost" # 主機IP MONGO_PORT = 27017 # 端口號 MONGO_DB = "Spider" # 庫名 MONGO_COLL = "bank" # collection名 # MONGO_USER = "" # MONGO_PSW = ""

其中用戶名和密碼可以根據需要添加。

??接下來，我們就可以運行爬蟲了。運行結果如下：

共用時3小時，爬了12多萬條數據，效率之高令人驚嘆！
??最后我們再來看一眼MongoDB中的數據：

??Perfect！本次分享到此結束，歡迎大家交流~~

總結

以上是生活随笔為你收集整理的Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【60岁老人年审】老来网app养老保险年
下一篇： CentOS6.7 安装hadoop2.

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Scrapy爬虫（6）爬取银行理财产品并存入MongoDB（共12w+数据）

總結