當前位置：首頁 > 编程语言 > python >内容正文

python

Python 爬取西刺可用代理IP，自带检测。

發布時間：2024/3/24 python 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 爬取西刺可用代理IP，自带检测。小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

功能：

爬取西刺代理IP
添加了自動檢測IP是否可用功能
輸出到Data.txt文件中

注意：

爬取西刺的時候，有可能真實IP被封，可以先嘗試爬取少量的代理IP，放入ip_use中。

測試：

1.測試輸出

2.文件輸出

代碼：

import requests import traceback import re import random import timeip_list=[] #獲取的ip列表 def main(num): #主函數，設置爬取的頁碼范圍for i in range(num):print("-----"+str(i+1)+"-----")url ='https://www.xicidaili.com/nn/'+str(i+1)get_ip(url)print('爬蟲休息1s中...')time.sleep(1)#等待1秒print('程序運行結束！（按任意鍵退出）')input()#不同類型的操作系統以及瀏覽器的標識 user_agent_list=['Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0)','Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0)','Mozilla/4.0(compatible;MSIE7.0;WindowsNT6.0)','Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11','Mozilla/5.0(WindowsNT6.1;rv:2.0.1)Gecko/20100101Firefox/4.0.1','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)','Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'] #請求是從哪個鏈接而來的 referer_list=['https://www.sogou.com/','http://blog.csdn.net/','https://www.baidu.com/',] #ip存放地址 ip_use=[ '117.88.176.153:3000',] #獲取html文本內容 def get_html(url,ip):try:header={'User-Agent':random.choice(user_agent_list), 'Referer':random.choice(referer_list)}ip=random.choice(ip)proxy_ip = 'http://' + ipproxy_ips = 'https://' + ipproxy = {'https': proxy_ips, 'http': proxy_ip}html=requests.get(url, headers=header, timeout=(3,7))html=html.text#print(html)return htmlexcept:print('獲取ip錯誤！')#traceback.print_exc() #打印異常#通過正則表達式，獲取爬取的ip def get_ip(url):try:html=get_html(url,ip_use)pattrens = r'alt="Cn" /></td>[\s]*?<td>([\d\D]*?)</td>[\s]*?<td>([\d\D]*?)</td>'root = re.findall(pattrens ,html)#print(len(root)) #當返回值為503的時候，root 的長度為0，可能是代理ip出現了問題，更換ip即可.for i in range(len(root)):#print(i)if(root[i][1]!='9999'and text_ip(root[i][0]+':'+root[i][1])):print(root[i][0]+':'+root[i][1])ip_list.append(root[i][0]+':'+root[i][1])write_text(root[i][0]+':'+root[i][1]+'\n')except:print('正則匹配錯誤！')#traceback.print_exc() #打印異常#測試可用IP def text_ip(ip):try:url="https://www.baidu.com/"header={'User-Agent':random.choice(user_agent_list), 'Referer':random.choice(referer_list)}proxy_ip = 'http://' + ipproxy_ips = 'https://' + ipproxy = {'https': proxy_ips, 'http': proxy_ip}html=requests.get(url, headers=header, proxies=proxy, timeout=(3,7))#print(html.status_code)if(html.status_code==200):return 1else:return 0except:#print('測試IP錯誤！')return 0 def write_text(ip):file = open("Data.txt",'a') #打開文件，并在文件尾添加內容file.write(ip) #寫入文件file.flush() #刷新緩沖區 file.close() #關閉文件if __name__ == '__main__':main(20)#爬取頁數

總結

以上是生活随笔為你收集整理的Python 爬取西刺可用代理IP，自带检测。的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【钢琴人狂喜】如何利用AI一键转MIDI
下一篇： [yueqian_scut]蓝牙防丢器原

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

Python 爬取西刺可用代理IP，自带检测。

功能：

注意：

測試：

代碼：

總結