當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[BEV] 学习笔记之BEVFormer(一)

發(fā)布時間：2024/1/18 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 [BEV] 学习笔记之BEVFormer(一) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1、前言

在BEV空間下進(jìn)行視覺任務(wù)逐漸成為自動駕駛中的技術(shù)主流，為了搞懂如何在BEV下進(jìn)行視覺任務(wù)，打算利用BEVFormer這個項目來理解其步驟,本文為BEVFormer的運(yùn)行以及整體框架的梳理(源碼看的有點(diǎn)亂了)，后續(xù)如果源碼看的比較熟練了,再準(zhǔn)備出一個比較詳細(xì)的注釋。
BEVFormer源碼:https://github.com/fundamentalvision/BEVFormer
BEVFormer學(xué)習(xí)文章:https://zhuanlan.zhihu.com/p/543335939

BEVFormer學(xué)習(xí)視頻:https://www.bilibili.com/video/BV1rK411o7PS/?spm_id_from=333.788&vd_source=2055b62125c0277a0d6878f41c89fec2
歡迎正在學(xué)習(xí)或者想學(xué)習(xí)BEV模型的朋友加入交流群一起討論、學(xué)習(xí)論文或者代碼實(shí)現(xiàn)中的問題，可以加 v群:Rex1586662742,q群:468713665

2、運(yùn)行

學(xué)習(xí)一個項目的必經(jīng)之路首先是要將這個項目運(yùn)行起來，建議完全按照官方的安裝環(huán)境的方式，避免發(fā)生問題，裝完環(huán)境后，按照官方的命令運(yùn)行即可。那么如何進(jìn)行debug來逐行進(jìn)行查看呢？我使用的是pycharm進(jìn)行debug，可以分為如下兩個步驟

鏈接launch.py
首先找到當(dāng)前conda環(huán)境(bev)下的的 launch.py文件鏈接到本項目

sudo ln -s /home/***/miniconda3/envs/bev/lib/python3.8/site-packages/torch/distributed/launch.py ./

編輯配置
點(diǎn)擊運(yùn)行，選擇配置文件，腳本路徑填入launch.py的絕對路徑，形參中填入以下參數(shù)，需要修改為本機(jī)路徑，一般來說就可以進(jìn)行debug了，如果報錯說找不到數(shù)據(jù)集里面的圖片，那么就參考下文的第三條。

--nproc_per_node=1 --master_port=29503 /data/rex/BEV/BEVFormer-master/tools/test.py /data/rex/BEV/BEVFormer-master/projects/configs/bevformer/bevformer_tiny.py /data/rex/BEV/bevformer_tiny_epoch_24.pth --launcher pytorch --eval bbox

鏈接數(shù)據(jù)集
運(yùn)行時,有可能報錯說找不到data數(shù)據(jù)集，可以在tools/test.py 里面加上

print(os.getcwd()) sys.path.append(os.getcwd())

然后將項目里面的data數(shù)據(jù)集路徑鏈接到os.getcwd()路徑下面下面，運(yùn)行時就不會報錯了。

3、網(wǎng)絡(luò)結(jié)構(gòu)

按照慣例,首先還是要放一張論文中的示意圖來進(jìn)行說明，下圖中，主要分為三個部分，最左邊的backbone，中間×6 的encoder，中間上面的Det/Seg Heads。

第一部分就是ResNet + FPN，BEVFormer主要的主要實(shí)在第二部分encoder進(jìn)行了創(chuàng)新，即Temporal Self-Attention，Spatial Cross-Attention。

4、模型配置文件

本文采用的是tiny模型進(jìn)行測試，幾個模型之間的不同點(diǎn)主要在于bev_query的大小以及FPN的多尺度特征個數(shù)配置文件為projects/configs/bevformer/bevformer_tiny.py，模型的網(wǎng)絡(luò)結(jié)構(gòu)在此進(jìn)行定義，運(yùn)行時，首先會對下面的模塊進(jìn)行注冊，從上到下基本上就是forward的步驟了。

model = dick(type='BEVFormer',...,# 主干網(wǎng)絡(luò)img_backbone = dict(type='ResNet',...)# 提取不同尺度的特征img_neck=dict(type='FPN',...)# 編解碼pts_bbox_head = dict(type='BEVFormerHead',...transformer=dict(type='PerceptionTransformer',...# 編碼網(wǎng)絡(luò)encoder=dict(type='BEVFormerEncoder',...# 單個block 推理時將會重復(fù)6次transformerlayers=dict(type='BEVFormerLayer',attn_cfgs=[dict(type='TemporalSelfAttention'...),dict(type='SpatialCrossAttention',deformable_attention=dict(type='MSDeformableAttention3D'...))]))# 解碼網(wǎng)絡(luò)decoder=dict(type='DetectionTransformerDecoder',...# decode blocktransformerlayers=dict(type='DetrTransformerDecoderLayer',attn_cfgs=[dict(type='MultiheadAttention',),dict(type='CustomMSDeformableAttention',...)])))bbox_coder = dict(type='NMSFreeCoder'...)# 可學(xué)習(xí)的位置編碼positional_encoding = dict(type='LearnedPositionalEncoding',...) )

5、forward 流程

能夠進(jìn)行debug后，就可以逐行代碼進(jìn)行查看變量的shape了，由于該項目涉及了很多模塊，而且是用openmmlab實(shí)現(xiàn)的，剛接觸時會有點(diǎn)繞，于是通過多次調(diào)試，我記錄了推理的大致流程，基本上可以按下面的數(shù)字依次進(jìn)行。

1、tools/test.py

outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,args.gpu_collect) # 進(jìn)入到projects/mmdet3d_plugin/bevformer/apis/test.py

2、projects/mmdet3d_plugin/bevformer/apis/test.py

def custom_multi_gpu_test(...):...for i, data in enumerate(data_loader):with torch.no_grad():result = model(return_loss=False, rescale=True, **data)# 進(jìn)入到 projects/mmdet3d_plugin/bevformer/detectors/bevformer.py...

3、projects/mmdet3d_plugin/bevformer/detectors/bevformer.py

class BEVFormer(...):def forward(...):if return_loss:return self.forward_train(**kwargs)else:return self.forward_test(**kwargs)# 進(jìn)入到下面 self.forward_test def forward_test(...):...# forwardnew_prev_bev, bbox_results = self.simple_test(...)# 進(jìn)入到下面 self.simple_test ...def simple_test(...):# self.extract_feat 主要包括兩個步驟 img_backbone、img_neck，通過卷積提取特征# 網(wǎng)絡(luò)為resnet + FPN# 如果是base模型，img_feats 為四個不同尺度的特征層# 如果是small、tiny，img_feats 為一個尺度的特征層img_feats = self.extract_feat(img=img, img_metas=img_metas)# Temproral Self-Attention + Spatial Cross-Attentionnew_prev_bev, bbox_pts = self.simple_test_pts(img_feats, img_metas, prev_bev, rescale=rescale)# 進(jìn)入到下面 self.simple_test_pts def simple_test_pts(...):# 對特征層進(jìn)行編解碼outs = self.pts_bbox_head(x, img_metas, prev_bev=prev_bev)# 進(jìn)入到 projects/mmdet3d_plugin/bevformer/dense_heads/bevformer_head.py

4、projects/mmdet3d_plugin/bevformer/dense_heads/bevformer_head.py

class BEVFormerHead(DETRHead):def forward(...):...if only_bev:...else:outputs = self.transformer(...)# 進(jìn)入到 projects/mmdet3d_plugin/bevformer/modules/transformer.pyfor lvl in range(hs.shape[0]):# 類別outputs_class = self.cls_branches[lvl](hs[lvl])# 回歸框信息tmp = self.reg_branches[lvl](hs[lvl])outs = ...return out# 返回到 projects/mmdet3d_plugin/bevformer/detectors/bevformer.py simple_test_pts函數(shù)中

5、projects/mmdet3d_plugin/bevformer/modules/transformer.py

class PerceptionTransformer(...):def forward(...):# 獲得bev特征bev_embed = self.get_bev_features(...)def get_bev_features(...):# 獲得bev特征 block * 6bev_embed = self.encoder(...)# 進(jìn)入到projects/mmdet3d_plugin/bevformer/modules/encoder.py...# decoderinter_states, inter_references = self.decoder(...)# 進(jìn)入到 projects/mmdet3d_plugin/bevformer/modules/decoder.py 中return bev_embed, inter_states, init_reference_out, inter_references_out# 返回到projects/mmdet3d_plugin/bevformer/dense_heads/bevformer_head.py

6、projects/mmdet3d_plugin/bevformer/modules/encoder.py

class BEVFormerEncoder(...):def forward(...):...for lid, layer in enumerate(self.layers):out = ...# 進(jìn)入到下面的 class BEVFormerLayer class BEVFormerLayer(...):def forward(...):# 通過layer進(jìn)入到不同的模塊中for layer in self.operation_order:# tmporal_self_attentionif layer == 'self_attn':# self.attentions 為 temporal_self_attention模塊query = self.attentions[attn_index]# 進(jìn)入到projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py# Spatial Cross-Attentionelif layer == 'cross_attn':query = self.attentions[attn_index]# 進(jìn)入到 projects/mmdet3d_plugin/bevformer/modules/spatial_cross_attention.py

7、projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py

class TemporalSelfAttention(...):def forward(...):output = ...# 殘差鏈接返回的結(jié)果為 Spatial Cross-Attention 模塊的輸入return self.dropout(output) + identity# 返回到projects/mmdet3d_plugin/bevformer/modules/encoder.py

8、projects/mmdet3d_plugin/bevformer/modules/spatial_cross_attention.py

class SpatialCrossAttention(...): def forward(...):queries = self.deformable_attention(...)# 進(jìn)入到下面的MSDeformableAttention3Dreturn self.dropout(slots) + inp_residual# 返回到返回到projects/mmdet3d_plugin/bevformer/modules/encoder.py# self.deformable_attentionclass MSDeformableAttention3D(BaseModule):def forward(...):eightsoutput = ...return output# 返回到上面 SpatialCrossAttention

9、projects/mmdet3d_plugin/bevformer/modules/decoder.py

class DetectionTransformerDecoder(...):def forward(...):for lid, layer in enumerate(self.layers):output = layer(...)# 進(jìn)入到下面CustomMSDeformableAttention...if self.return_intermediate:intermediate.append(output)intermediate_reference_points.append(reference_points)return output, reference_points# 返回到 projects/mmdet3d_plugin/bevformer/modules/transformer.pyclass CustomMSDeformableAttention(...):def forward(...):'''query: [900, 1, 256] query_pos:[900, 1, 256] 可學(xué)習(xí)的位置編碼'''output = multi_scale_deformable_attn_pytorch(...)output = self.output_proj(output)return self.dropout(output) + identity# 返回到上面的DetectionTransformerDecoder

6、總結(jié)

經(jīng)過上面的步驟，基本疏通了BEVFormer的推理步驟，但是里面存在許多細(xì)節(jié)，由于還在看源碼，以及有一些問題還沒解決，后續(xù)的詳解版本會對代碼里面的變量進(jìn)行詳細(xì)注解(已經(jīng)在寫了，如果沒啥問題的話)，維度以及作用，一方面是加深對BEVFormer的理解，另一方面提高自己對BEV模型的認(rèn)知。

總結(jié)

以上是生活随笔為你收集整理的[BEV] 学习笔记之BEVFormer(一)的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。