python用法查询笔记_Python爬虫学习笔记(三)
handler處理器自定義 - Cookies && URLError && json簡單使用
Cookies:
以抓取https://www.yaozh.com/為例
Test1(不使用cookies):
代碼:
import?urllib.request
#?1.添加URL
url?=?"https://www.yaozh.com/"#?2.添加請求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
#?3.構建請求對象
request?=?urllib.request.Request(url,?headers=headers)
#?4.發送請求對象
response?=?urllib.request.urlopen(request)
#?5.讀取數據
data?=?response.read()
#保存到文件中,驗證數據
with?open('01cookies.html',?'wb')as?f:
f.write(data)
View Code
返回:
此時進入頁面顯示為游客模式,即未登錄狀態。
Test2(使用cookies:手動登錄):
在network中查找cookies部分
代碼(先登錄在抓取):
"""????直接獲取個人中心的頁面
手動粘貼,復制抓包的cookies
放在?request請求對象的請求頭里面"""import?urllib.request
#?1.添加URL
url?=?"https://www.yaozh.com/"#?2.添加請求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36","Cookie":?"acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?PHPSESSID=bvc8utedu2sljbdb818m4va8q3;?_ga=GA1.2.472741825.1611992531;?_gid=GA1.2.2079712096.1611992531;?yaozh_logintime=1611992697;?yaozh_user=1038868%09s1mpL3;?yaozh_userId=1038868;?yaozh_jobstatus=kptta67UcJieW6zKnFSe2JyYnoaSZ5htnZqdg26qb21rg66flM6bh5%2BscZdyVNaWz9Gwl4Ny2G%2BenofNlKqpl6XKppZVnKmflWlxg2lolJabd519626986447e0E3cd918611D19BBEbmpaamm6HcNiemZtVq56lloN0pG2SaZ%2BGam2SaWucl5ianZiWbIdw4g%3D%3Da9295385d0680617486debd4ce304305;?_gat=1;?Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1611992698;?yaozh_uidhas=1;?yaozh_mylogin=1611992704;?acw_tc=707c9fc316119925309487503e709498d3fe1f6beb4457b1cb1399958ad4d3;?Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1611992531%2C1611992638",
}
#?3.構建請求對象
request?=?urllib.request.Request(url,?headers=headers)
#?4.發送請求對象
response?=?urllib.request.urlopen(request)
#?5.讀取數據
data?=?response.read()
#保存到文件中,驗證數據
with?open('01cookies2.html',?'wb')as?f:
f.write(data)
先登錄再抓取
返回:
此時為登錄狀態s1mpL3。
Test3(使用cookies:代碼登錄):
準備:
1.勾選Preserve Log,用于記錄上一次登錄
2.根據登錄時的數據報,發現發送POST請求
3.登陸之后退出,進入登錄頁面,檢察元素,查找表單各項數據,
代碼:
"""????獲取個人頁面1.代碼登錄??登陸成功????cookie有效2.自動帶著cookie?去請求個人中心
cookiejar:自動保存cookie"""import?urllib.requestfrom?http?import?cookiejarfrom?urllib?import?parse
#?登陸之前,登錄頁的網址,https://www.yaozh.com/login,找登錄參數#?后臺,根據發送的請求方式來判斷,如果是GET,返回登錄頁面,如果是POST,返回登錄結果
#???1.代碼登錄
#?1.1?登陸的網址
login_url?=?"https://www.yaozh.com/login"#?1.2?登陸的參數
login_form_data?=?{"?username":?"s1mpL3","pwd":?"***************",#個人隱私,代碼不予顯示"formhash":?"87F6F28A4*",#個人隱私,代碼不予顯示"backurl":?"https%3A%2F%2Fwww.yaozh.com%2F",
}
#?參數需要轉碼;POST請求的data要求是bytes樂行
login_str?=?urllib.parse.urlencode(login_form_data).encode('utf-8')
#?1.3?發送POST登錄請求
cookie_jar?=?cookiejar.CookieJar()
#?定義有添加cookie功能的處理器
cook_handler?=?urllib.request.HTTPCookieProcessor(cookie_jar)
#?根據處理器?生成openner
openner?=?urllib.request.build_opener(cook_handler)
#?帶著參數,發送POST請求
#?添加請求頭
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
login_request?=?urllib.request.Request(login_url,?headers=headers,?data=login_str)
#?如果登陸成功,cookiejar自動保存cookie
openner.open(login_request)
#???2.?代碼帶著cookie去訪問個人中心
center_url?=?"https://www.yaozh.com/member/"center_request?=?urllib.request.Request(center_url,?headers=headers)
response?=?openner.open(center_url)
#?bytes?-->?str
data?=?response.read().decode()
with?open('02cookies.html',?'w',?encoding="utf-8")as?f:
f.write(data)
代碼登錄
返回:
以s1mpL3用戶返回
注:
1.cookiejar庫的使用from?http?import?cookiejar
cookiejar.CookieJar()
2.HTTPCookieProcessor():有cookie功能的處理器
3.代碼登錄:只需修改用戶名和密碼
4.Python3報錯:
UnicodeEncodeError:?'gbk'?codec?can't?encode?character?'\xa0'?in?position?19523:?illegal?multibyte?sequence
修改:open()中添加encoding="utf-8"with?open('02cookies.html',?'w',?encoding="utf-8")as?f:
f.write(data)
解決方案參考:
URLError:urllib.request?提示錯誤
分為URLError?HTTPError
其中HTTPError為URLError的子類
Test:
代碼1:import?urllib.request
url?=?'http://www.xiaojian.cn'?#?假設
response?=?urllib.request.urlopen(url)
返回1:
部分報錯:raise?URLError(err)
urllib.error.URLError:?
代碼2:import?urllib.request
url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'response?=?urllib.request.urlopen(url)
返回2:
部分報錯:raise?HTTPError(req.full_url,?code,?msg,?hdrs,?fp)
urllib.error.HTTPError:?HTTP?Error?404:?Not?Found
代碼3:import?urllib.request
url?=?'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/1111'try:
response?=?urllib.request.urlopen(url)
except?urllib.request.HTTPError?as?error:
print(error.code)
except?urllib.request.URLError?as?error:
print(error)
返回3:
代碼4:import?urllib.request
url?=?'https://blog.cs1'try:
response?=?urllib.request.urlopen(url)
except?urllib.request.HTTPError?as?error:
print(error.code)
except?urllib.request.URLError?as?error:
print(error)
返回4:
Requsets:
準備:
安裝第三方模塊:pip?install?requests
Test1(基本屬性:GET):
代碼1(不帶請求頭):
import?requests
url?=?"http://www.baidu.com"response?=?requests.get(url)
#?content屬性:返回類型是bytes
data?=?response.content
print(data)
data1?=?response.content.decode('utf-8')
print(type(data1))
#?text屬性:返回類型是文本str(如果響應內容沒有編碼,將自行編碼,可能出錯。因此優先使用content)
data2?=?response.text
print(type(data2))
View Code
返回1:
代碼2(帶請求頭):import?requestsclass?RequestSpider(object):
def?__init__(self):
url?=?"https://www.baidu.com/"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
self.response?=?requests.get(url,?headers=headers)
def?run(self):
data?=?self.response.content
#?1.獲取請求頭
request_headers1?=?self.response.request.headers
print(request_headers1)
#?2.獲取響應頭
request_headers2?=?self.response.headers
print(request_headers2)
#?3.獲取響應狀態碼
code?=?self.response.status_code
print(code)
#?4.獲取請求的cookie
request_cookie?=?self.response.request._cookies
print(request_cookie)
#注:用瀏覽器進入百度時,可能會有很多cookie,這是瀏覽器自動添加的,不是服務器給的
#?5.獲取響應的cookie
response_cookie?=?self.response.cookies
print(response_cookie)
RequestSpider().run()
返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use2.py
{'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36',?'Accept-Encoding':?'gzip,?deflate',?'Accept':?'*/*',?'Connection':?'keep-alive'}
{'Bdpagetype':?'1',?'Bdqid':?'0xe0b22322001a2c4a',?'Cache-Control':?'private',?'Connection':?'keep-alive',?'Content-Encoding':?'gzip',?'Content-Type':?'text/html;charset=utf-8',?'Date':?'Sat,?30?Jan?2021?09:27:06?GMT',?'Expires':?'Sat,?30?Jan?2021?09:26:56?GMT',?'P3p':?'CP="?OTI?DSP?COR?IVA?OUR?IND?COM?",?CP="?OTI?DSP?COR?IVA?OUR?IND?COM?"',?'Server':?'BWS/1.1',?'Set-Cookie':?'BAIDUID=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BIDUPSID=E577CD647F2B1CA6A7C0F4112781CAF9;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?PSTM=1611998826;?expires=Thu,?31-Dec-37?23:55:55?GMT;?max-age=2147483647;?path=/;?domain=.baidu.com,?BAIDUID=E577CD647F2B1CA65749857950B007E4:FG=1;?max-age=31536000;?expires=Sun,?30-Jan-22?09:27:06?GMT;?domain=.baidu.com;?path=/;?version=1;?comment=bd,?BDSVRTM=0;?path=/,?BD_HOME=1;?path=/,?H_PS_PSSID=33423_33516_33402_33273_33590_26350_33568;?path=/;?domain=.baidu.com,?BAIDUID_BFESS=E577CD647F2B1CA6A7C0F4112781CAF9:FG=1;?Path=/;?Domain=baidu.com;?Expires=Thu,?31?Dec?2037?23:55:55?GMT;?Max-Age=2147483647;?Secure;?SameSite=None',?'Strict-Transport-Security':?'max-age=172800',?'Traceid':?'1611998826055672090616191042239287929930',?'X-Ua-Compatible':?'IE=Edge,chrome=1',?'Transfer-Encoding':?'chunked'}200
,?,?,?,?,?,?]>Process?finished?with?exit?code?0
Test2(URL自動轉譯):
代碼1:
#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
#?參數自動轉譯
url?=?"http://www.baidu.com/s?wd=爬蟲"headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers)
data?=?response.content.decode()
with?open('baidu.html',?'w',?encoding="utf-8")as?f:
f.write(data)
漢字參數自動轉譯
返回:
成功返回并生成文件,此時漢字作為參數實現了自動轉譯。
代碼2:
#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
#?參數自動轉譯
url?=?"http://www.baidu.com/s"parmas?=?{'wd':?'爬蟲',
}
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers,?params=parmas)
data?=?response.content.decode()
with?open('baidu1.html',?'w',?encoding="utf-8")as?f:
f.write(data)
字典自動轉譯
返回:
成功返回并生成文件,此時字典作為參數實現了自動轉譯。
注:
發送POST請求和添加參數requests.post(url,?data=(參數{}),?json=(參數))
Test3(json):
代碼:#?https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&tn=baidu&wd=%E7%88%AC%E8%99%AB&oq=%2526lt%253BcH0%2520-%2520Nu1L&rsv_pq=d38dc072002f5aef&rsv_t=62dcS%2BcocFsilJnL%2FcjmqGeUvo6S6XMFTiyfxi22AnqTbscZBf6K%2F13WW%2Bo&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=4&rsv_sug1=3&rsv_sug7=100&rsv_sug2=0&rsv_btype=t&inputT=875&rsv_sug4=875#?https://www.baidu.com/s?wd=%E7%88%AC%E8%99%ABimport?requests
import?json
url?=?"https://api.github.com/user"#這個網址返回的內容不是HTML,而是標準的json
headers?=?{"User-Agent":?"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/88.0.4324.104?Safari/537.36"}
response?=?requests.get(url,?headers=headers)
#?str
data?=?response.content.decode()
print(data)
#?str?-->?dict
data_dict?=?json.loads(data)
print(data_dict["message"])
#?json()會自動將json字符串轉換成Python?dict?list
data1?=?response.json()
print(data1)
print(type(data1))
print(data1["message"])
返回:E:\python\python.exe?H:/code/Python爬蟲/Day04/03-requests_use3.py
{??"message":?"Requires?authentication",??"documentation_url":?"https://docs.github.com/rest/reference/users#get-the-authenticated-user"}
Requires?authentication
{'message':?'Requires?authentication',?'documentation_url':?'https://docs.github.com/rest/reference/users#get-the-authenticated-user'}Requires?authentication
Process?finished?with?exit?code?0
總結
以上是生活随笔為你收集整理的python用法查询笔记_Python爬虫学习笔记(三)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: c语言星空程序,C语言实现动态星空
- 下一篇: 京瓷m5021cdn如何设置扫描_京瓷产