服务器IIS6/IIS7、Nginx、Apache屏蔽垃圾爬虫UA禁止垃圾爬虫,屏蔽指定UA
生活随笔
收集整理的這篇文章主要介紹了
服务器IIS6/IIS7、Nginx、Apache屏蔽垃圾爬虫UA禁止垃圾爬虫,屏蔽指定UA
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
有的時候打開后臺會發現網站后臺有各種莫名其妙的蜘蛛UA,而不是搜索引擎來過的,那么可能是被別人給爬取了,可以用下面方法來屏蔽不明的蜘蛛UA,可根據網站實際情況來修改要屏蔽的蜘蛛UA
對付這種垃圾的蜘蛛,最好的方法就是按照訪問規則,查看對應的UA,按照UA規則,進行禁止訪問,那么我們應該怎么做呢?
一:找到垃圾蜘蛛的UA特征
我使用的是寶塔面板。通過寶塔面板上的網站監控報表,可以看到對應網站的Nginx日志。如果你使用的不是寶塔面板,你可以將網站的日志下載下來。用notepad++或者其他代碼編輯軟件打開。
二:針對特定UA進行屏蔽
屏蔽UA功能,是寫在對應網站的nginx的配置文件中的。
如果你使用的是寶塔面板。配置路徑如下:
Nginx將以下代碼加入到配置文件server里:
if?($http_user_agent?~?"MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"?) { return?444; }IIS7/IIS8/IIS10及以上web服務請在網站根目錄下創建web.config文件,并寫入如下代碼即可:
<?xml?version="1.0"?encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <rule?name="Block?spider"> <match?url="(^robots.txt$)"? ignoreCase="false"?negate="true"?/> <conditions> <add?input="{HTTP_USER_AGENT}"?pattern="MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ignoreCase="true"?/> </conditions> <action?type="AbortRequest"?/> </rule> </rules> </rewrite> </system.webServer> </configuration>IIS6請在isapi重寫組件中添加以下規則:
#Block?spider RewriteCond?%{HTTP_USER_AGENT}?(MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$)?[NC] RewriteRule?!(^/robots.txt$)?-?[F]Apache請在.htaccess文件中添加如下規則:
<IfModule?mod_rewrite.c> RewriteEngine?On #Block?spider RewriteCond?%{HTTP_USER_AGENT}?"MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"?[NC] RewriteRule?!(^robots\.txt$)?-?[F] </IfModule>?
總結
以上是生活随笔為你收集整理的服务器IIS6/IIS7、Nginx、Apache屏蔽垃圾爬虫UA禁止垃圾爬虫,屏蔽指定UA的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: QQ抢车位外挂(续)
- 下一篇: HRT:使用Huge Pages进行低延