生活随笔
收集整理的這篇文章主要介紹了
Scrapy-css选择器
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
和xpath選擇器比起來,感覺CSS選擇器容易一些,跟寫.css時方法基本一樣,就是在獲取內容時和xpath不同,這里需要注意一下.
這里介紹如何用css選擇器提取出一篇文章的數據
提取的數據跟xpath那篇文章內容相同
之前xpath中我們獲取元素是通過.entry-header h1::text,如果是屬性則用.entry-header a::attr(href)
介紹一個常用的函數extract_first()
相當于extract()[0],但是extract()[0]當數組沒有元素時,也就是沒有獲取到數據時會出錯,所以用extract_first()方法,也可以加上需要返回的內容,比如空,extract_first("")
title
= response
.css
(".entry-header h1::text").extract_first
()
create_date
= response
.css
("p.entry-meta-hide-on-mobile::text").extract
()[0].strip
().replace
('·','').strip
()
praise_nums
= response
.css
('#110287votetotal::text').extract
()[0]
fav_nums
= response
.css
('.btn-bluet-bigger.href-style.bookmark-btn .register-user-only::text ').extract
()[0].strip
()
match_re
= re
.match
('.*?(\d+).*',fav_nums
)
if match_re
:fav_nums
= match_re
.group
(1)
comment_nums
= response
.css
('.btn-bluet-bigger.href-style.hide-on-480::text').extract
()[0].strip
()
match_re
= re
.match
('.*?(\d+).*',fav_nums
)
if match_re
:comment_nums
= match_re
.group
(1)
tag_list
= response
.css
('.entry-meta-hide-on-mobile a::text').extract
()
content
= response
.css
('div.entry').extract
()[0]
tag_list
= [element
for element
in tag_list
if not element
.strip
().endswith
('評論')]
tag
= ','.join
(tag_list
)
當我們要選擇的屬性名字有多個時比如下面:
這市在選擇時應該用
post_urls = response.css('#archive .post.floated-thumb .post-thumb a::attr(href)').extract()
也就是.post.floated-thumb應該連起來,或者只寫.floated-thumb
##完整代碼(準)
def parse_detail(self
, response
):title
= response
.css
(".entry-header h1::text").extract_first
()create_date
= response
.css
("p.entry-meta-hide-on-mobile::text").extract
()[0].strip
().replace
("·","").strip
()praise_nums
= response
.css
(".vote-post-up h10::text").extract
()[0]fav_nums
= response
.css
(".bookmark-btn::text").extract
()[0]match_re
= re
.match
(".*?(\d+).*", fav_nums
)if match_re
:fav_nums
= int(match_re
.group
(1))else:fav_nums
= 0comment_nums
= response
.css
("a[href='#article-comment'] span::text").extract
()[0]match_re
= re
.match
(".*?(\d+).*", comment_nums
)if match_re
:comment_nums
= int(match_re
.group
(1))else:comment_nums
= 0content
= response
.css
("div.entry").extract
()[0]tag_list
= response
.css
("p.entry-meta-hide-on-mobile a::text").extract
()tag_list
= [element
for element
in tag_list
if not element
.strip
().endswith
("評論")]tags
= ",".join
(tag_list
)pass
總結
以上是生活随笔為你收集整理的Scrapy-css选择器的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。