python使用scrapy_python使用scrapy发送post请求的坑
標簽:
使用 requests 發送 post 請求
先來看看使用requests來發送post請求是多少好用,發送請求
Requests 簡便的 API 意味著所有 HTTP 請求類型都是顯而易見的。
例如,你可以這樣發送一個 HTTP POST 請求:
>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})
使用data可以傳遞字典作為參數,同時也可以傳遞元祖
>>>payload = (('key1', 'value1'), ('key1', 'value2'))
>>>r = requests.post('http://httpbin.org/post', data=payload)
>>>print(r.text)
{
...
"form": {
"key1": [
"value1",
"value2"
]
},
...
}
傳遞 json 是這樣
>>>import json
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
>>>r = requests.post(url, data=json.dumps(payload))
2.4.2 版的新加功能:
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
>>>r = requests.post(url, json=payload)
也就是說,你不需要對參數做什么變化,只需要關注使用data=還是json=,其余的requests都已經幫你做好了。
使用scrapy發送post請求
通過源碼可知scrapy默認發送的get請求,當我們需要發送攜帶參數的請求或登錄時,是需要post、請求的,以下面為例
from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
name = 'myspider'
def start_requests(self):
yield scrapy.FormRequest(
url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',
formdata={
'first': 'true',#這里不能給bool類型的True,requests模塊中可以
'pn': '1',#這里不能給int類型的1,requests模塊中可以
'kd': 'python'
}, # 這里的formdata相當于requ模塊中的data,key和value只能是鍵值對形式
callback=self.parse
)
def parse(self, response):
datas=json.loads(response.body.decode())['content']['positionResult']['result']
for data in datas:
print(data['companyFullName'] + str(data['positionId']))
官方推薦的 Using FormRequest to send data via HTTP POST
return [FormRequest(url="http://www.example.com/post/action",
formdata={'name': 'John Doe', 'age': '27'},
callback=self.after_post)]
這里使用的是FormRequest,并使用formdata傳遞參數,看到這里也是一個字典。
但是,超級坑的一點來了,今天折騰了一下午,使用這種方法發送請求,怎么發都會出問題,返回的數據一直都不是我想要的
return scrapy.FormRequest(url, formdata=(payload))
在網上找了很久,最終找到一種方法,使用scrapy.Request發送請求,就可以正常的獲取數據。
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)
參考:Send Post Request in Scrapy
my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST',
body=json.dumps(my_data),
headers={'Content-Type':'application/json'} )
FormRequest 與 Request 區別
在文檔中,幾乎看不到差別,
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
說FormRequest新增加了一個參數formdata,接受包含表單數據的字典或者可迭代的元組,并將其轉化為請求的body。并且FormRequest是繼承Request的
class FormRequest(Request):
def __init__(self, *args, **kwargs):
formdata = kwargs.pop('formdata', None)
if formdata and kwargs.get('method') is None:
kwargs['method'] = 'POST'
super(FormRequest, self).__init__(*args, **kwargs)
if formdata:
items = formdata.items() if isinstance(formdata, dict) else formdata
querystr = _urlencode(items, self.encoding)
if self.method == 'POST':
self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
self._set_body(querystr)
else:
self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
###
def _urlencode(seq, enc):
values = [(to_bytes(k, enc), to_bytes(v, enc))
for k, vs in seq
for v in (vs if is_listlike(vs) else [vs])]
return urlencode(values, doseq=1)
最終我們傳遞的{‘key': ‘value', ‘k': ‘v'}會被轉化為'key=value&k=v' 并且默認的method是POST,再來看看Request
class Request(object_ref):
def __init__(self, url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None, flags=None):
self._encoding = encoding # this one has to be set first
self.method = str(method).upper()
默認的方法是GET,其實并不影響。仍然可以發送post請求。這讓我想起來requests中的request用法,這是定義請求的基礎方法。
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request `.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) ` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response ` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
標簽:
來源: https://blog.csdn.net/freeking101/article/details/82908342
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的python使用scrapy_python使用scrapy发送post请求的坑的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python多进程打印输出_python
- 下一篇: mfc static 文本自适应宽度_基