生活随笔
收集整理的這篇文章主要介紹了
scrapy proxy and user_agent
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
1.在settings.py同級目錄下新建文件useragent.py
Java代碼??
#?-*-coding:utf-8-*-??from?scrapy?import?log??import?logging????import?random??from?scrapy.downloadermiddlewares.useragent?import?UserAgentMiddleware??class?UserAgent(UserAgentMiddleware):????????def?__init__(self,?user_agent=''):??????????self.user_agent?=?user_agent????????def?process_request(self,?request,?spider):??????????ua?=?random.choice(self.user_agent_list)??????????if?ua:??????????????#顯示當(dāng)前使用的useragent??????????????#print?"********Current?UserAgent:%s************"?%ua??????????????#記錄??????????????log.msg('Current?UserAgent:?'+ua,?level=logging.DEBUG)??????????????request.headers.setdefault('User-Agent',?ua)????????#the?default?user_agent_list?composes?chrome,I?E,firefox,Mozilla,opera,netscape??????#for?more?user?agent?strings,you?can?find?it?in?http:????user_agent_list?=?[\??????????"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/537.1?"??????????"(KHTML,?like?Gecko)?Chrome/22.0.1207.1?Safari/537.1",??????????"Mozilla/5.0?(X11;?CrOS?i686?2268.111.0)?AppleWebKit/536.11?"??????????"(KHTML,?like?Gecko)?Chrome/20.0.1132.57?Safari/536.11",??????????"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.6?"??????????"(KHTML,?like?Gecko)?Chrome/20.0.1092.0?Safari/536.6",??????????"Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.6?"??????????"(KHTML,?like?Gecko)?Chrome/20.0.1090.0?Safari/536.6",??????????"Mozilla/5.0?(Windows?NT?6.2;?WOW64)?AppleWebKit/537.1?"??????????"(KHTML,?like?Gecko)?Chrome/19.77.34.5?Safari/537.1",??????????"Mozilla/5.0?(X11;?Linux?x86_64)?AppleWebKit/536.5?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1084.9?Safari/536.5",??????????"Mozilla/5.0?(Windows?NT?6.0)?AppleWebKit/536.5?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1084.36?Safari/536.5",??????????"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1063.0?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?5.1)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1063.0?Safari/536.3",??????????"Mozilla/5.0?(Macintosh;?Intel?Mac?OS?X?10_8_0)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1063.0?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1062.0?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1062.0?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.1;?WOW64)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.1)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1061.1?Safari/536.3",??????????"Mozilla/5.0?(Windows?NT?6.2)?AppleWebKit/536.3?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1061.0?Safari/536.3",??????????"Mozilla/5.0?(X11;?Linux?x86_64)?AppleWebKit/535.24?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1055.1?Safari/535.24",??????????"Mozilla/5.0?(Windows?NT?6.2;?WOW64)?AppleWebKit/535.24?"??????????"(KHTML,?like?Gecko)?Chrome/19.0.1055.1?Safari/535.24"?????????]?? ?
2.在settings.py同級目錄新建文件proxymiddlewares.py
Java代碼??
#?-*-?coding:?utf-8?-*-??import?random,?base64??????class?ProxyMiddleware(object):????????proxyList?=?[?\??????????'121.193.143.249:80','112.126.65.193:80','122.96.59.104:82','115.29.98.139:9999','117.131.216.214:80','116.226.243.166:8118','101.81.22.21:8118','122.96.59.107:843'??????????????]????????def?process_request(self,?request,?spider):??????????#?Set?the?location?of?the?proxy??????????pro_adr?=?random.choice(self.proxyList)??????????print("USE?PROXY?->?"?+?pro_adr)??????????request.meta['proxy']?=?"http://"?+?pro_adr?? ?
3.修改settings.py (注意DOWNLOADER_MIDDLEWARES)
Java代碼??
#?-*-?coding:?utf-8?-*-????BOT_NAME?=?'ip_proxy_pool'????SPIDER_MODULES?=?['ip_proxy_pool.spiders']??NEWSPIDER_MODULE?=?'ip_proxy_pool.spiders'????#?Obey?robots.txt?rules??ROBOTSTXT_OBEY?=?False????ITEM_PIPELINES?=?{?????'ip_proxy_pool.pipelines.IpProxyPoolPipeline':?300,??}????#爬取間隔??DOWNLOAD_DELAY?=?1????#?禁用cookie??COOKIES_ENABLED?=?False??????#?重寫默認(rèn)請求頭??DEFAULT_REQUEST_HEADERS?=?{????'Accept':?'text/html,?application/xhtml+xml,?application/xml',????'Accept-Language':?'zh-CN,zh;q=0.8',????'Host':'ip84.com',????'Referer':'http://ip84.com/',????'X-XHR-Referer':'http://ip84.com/'??}????#激活自定義UserAgent和代理IP??#?See?http:DOWNLOADER_MIDDLEWARES?=?{?????'ip_proxy_pool.useragent.UserAgent':?1,?????'ip_proxy_pool.proxymiddlewares.ProxyMiddleware':100,?????'scrapy.downloadermiddleware.useragent.UserAgentMiddleware'?:?None,??}?? ?
轉(zhuǎn)載于:https://www.cnblogs.com/rabbittail/p/7836705.html
總結(jié)
以上是生活随笔為你收集整理的scrapy proxy and user_agent的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。