使用Scrapy框架发送POST请求
不推薦使用scrapy框架發送post請求,配置復雜,如果在數據量大 的情況下,可以通過如下代碼來實現:
方法一:就是重寫scrapy下面的start_requests方法
scrapy默認發送的是get請求,發送post請求時需要重寫start_requests(self)。
import scrapyclass FySpider(scrapy.Spider):name = 'fy'# allowed_domains = ['www.baidu.com']start_urls = ['https://fanyi.baidu.com/sug']def start_requests(self):data={'kw':"beautiful"}for url in self.start_urls:yield scrapy.FormRequest(url=url,formdata=data,callback=self.parse)def parse(self, response):print(response.text)方法二:將URL鏈接寫在外部,然后手動去發送請求
可以寫:
scrapy.FormRequest(url=url,formdata=data,callback=self.parse)也可以這樣寫:
scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},) # -*- coding: utf-8 -*- import scrapy from video.items import VideoItemclass MvSpider(scrapy.Spider):name = 'mv'# allowed_domains = ['www.piaohua.com/']start_urls = ['http://www.88ys.cc/dianying/1.html']def detail_parse(self,response):item=response.meta['item']year=response.xpath('//div[@class="ct-c"]/dl/dd[3]/text()').extract_first()country = response.xpath('//div[@class="ct-c"]/dl/dd[2]/text()').extract_first()item['year']=yearitem['country'] =countryyield itemdef parse(self, response):li_list=response.xpath('//div[@class="index-area clearfix"]/ul/li/a')item=VideoItem()for li in li_list:m_url='http://www.88ys.cc'+li.xpath('./@href').extract_first()name=li.xpath('./@title').extract_first()item['name']=nameyield scrapy.Request(url=m_url,callback=self.detail_parse,meta={'item':item})FormRequest 與 Request 區別
官方文檔如下,在文檔中,幾乎看不到差別。
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
簡單說就是FormRequest新增加了一個參數formdata,接受包含表單數據的字典或者可迭代的元組,并將其轉化為請求的body。并且FormRequest是繼承Request的。
class FormRequest(Request):def __init__(self, *args, **kwargs):formdata = kwargs.pop('formdata', None)if formdata and kwargs.get('method') is None:kwargs['method'] = 'POST'super(FormRequest, self).__init__(*args, **kwargs)if formdata:items = formdata.items() if isinstance(formdata, dict) else formdataquerystr = _urlencode(items, self.encoding)if self.method == 'POST':self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')self._set_body(querystr)else:self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)###def _urlencode(seq, enc):values = [(to_bytes(k, enc), to_bytes(v, enc))for k, vs in seqfor v in (vs if is_listlike(vs) else [vs])]return urlencode(values, doseq=1)最終我們傳遞的{‘key’: ‘value’, ‘k’: ‘v’}會被轉化為’key=value&k=v’ 并且默認的method是POST,再來看看Request。
class Request(object_ref):def __init__(self, url, callback=None, method='GET', headers=None, body=None,cookies=None, meta=None, encoding='utf-8', priority=0,dont_filter=False, errback=None, flags=None):self._encoding = encoding # this one has to be set firstself.method = str(method).upper()總結
以上是生活随笔為你收集整理的使用Scrapy框架发送POST请求的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: requests库提示警告:Insecu
- 下一篇: scrapy去重原理,scrapy_re