當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[BEV]学习笔记之BEVDepth（原理+代码）

發布時間：2024/3/24 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 [BEV]学习笔记之BEVDepth（原理+代码）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- - - 1、前言
    - 2、模型簡介
    - 3、代碼解析
    - 4、總結

1、前言

繼lift-splat-shoot之后，純視覺BEV感知又有了新的進展，如曠視科技、華中理工和西安交大提出來的BEVDepth。本文首先會對BEVDepth方法進行簡要說明，后面會結合閱讀代碼過程中的理解對整個流程進行詳細的說明，尤其是voxel_pooling的實現。

repo :https://github.com/Megvii-BaseDetection/BEVDepth

paper:https://arxiv.org/pdf/2206.10092

歡迎進入BEV感知交流群，一起解決學習過程發現的問題，可以加v群:Rex1586662742,q群:468713665。

2、模型簡介

常見的自底向上方法的會顯示的估計每個特征點的距離，但是這些距離是隱式學習的，在BEVDepth中會利用lidar的點云來監督預測出來的深度，使得預測的距離更加接近真實值。此外，考慮到相機外參可能會對結果進行干擾，文章增加一個網絡來學習相機參數，作為注意力權重作用于圖像和深度特征。同時，利用cuda實現了高效的體素池化操作。下面為論文中的網絡結構圖。

從左下角出發，一直到右下角結束，大體可以分為四個部分:環視圖片特征提取、深度特征預測、Voxel Pooling和Detection Head，BEVDepth論文的關鍵深度特征提取以及Voxel Pooling這兩個部分，因此下面將會針對這兩個部分的代碼進行說明。

3、代碼解析

下面的代碼是基本上是按照forward的順序進行的，會對關鍵代碼進行解釋以及shape的標注。

1、bevdepth/models/base_bev_depth.py

class BaseBEVDepth(nn.Module):def forward(...):if self.is_train_depth and self.training:# 訓練時用Lidar的深度來監督 depth_predx, depth_pred = self.backbone(...)preds = self.head(x)else:# x:[1, 160, 128, 128] 關鍵幀+過渡幀的 bev特征x = self.backbone(x, mats_dict, timestamps) # -> bevdepth/layers/backbones/base_lss_fpn.py# 解碼preds = self.head(x) # 參考centerpoint

2、bevdepth/layers/backbones/base_lss_fpn.py

class BaseLSSFPN(nn.Module):def __init__(...):...def forward(...):"""Args:sweep_imgs:[1, 2, 6, 3, 256, 704],關鍵幀以及過渡幀圖片mats_dict(dict):sensor2ego_mats:相機坐標系->車輛坐標系intrin_mats:相機內參ida_mats:圖像數據增強矩陣sensor2sensor_mats:key frame camera to sweep frame camera,關鍵幀到過渡幀的變化矩陣bda_mat:bev特征增強矩陣 """# 提取關鍵幀的BEV特征 key_frame_res:[1, 80, 128, 128])key_frame_res = self._forward_single_sweep(...)for sweep_index in range(1, num_sweeps):# 提取過渡幀的bev特征feature_map = self._forward_single_sweep(...)ret_feature_list.append(feature_map)if is_return_depth:return torch.cat(ret_feature_list, 1), key_frame_res[1]return torch.cat(ret_feature_list, 1) def _forward_single_sweep(...):# 提取環視圖片特征# img_feats:[1, 1, 6, 512, 16, 44]img_feats = self.get_cam_feats(sweep_imgs)source_features = img_feats[:, 0, ...]# 提取Depth以及contextdepth_feature = self._forward_depth_net(...)# 預測的距離分布 depth:[6, 112, 16, 44]depth = depth_feature[:, :self.depth_channels].softmax(1)# 對應論文中的 Context Feature * Depth Distribution 操作img_feat_with_depth = ... # # 車輛坐標系下的視錐坐標點 geom_xyz:[1, 6, 112, 16, 44, 3] geom_xyz = self.get_geometry(...)# 將車輛坐標系的原點移動到左下角geom_xyz = ((geom_xyz - (self.voxel_coord - self.voxel_size / 2.0)) /self.voxel_size).int()# 獲得最終BEV特征 [1, 80, 128, 128]feature_map = voxel_pooling(...) # -> bevdepth/ops/voxel_pooling/voxel_pooling.pyif is_return_depth:# 訓練時需要返回預測的深度，用lidar信號進行監督return feature_map.contiguous(), depthreturn feature_map.contiguous()def _forward_depth_net(...):return self.depth_net(feat, mats_dict)def get_geometry(...):"""Transfer points from camera coord to ego coordArgs:rots(Tensor): Rotation matrix from camera to ego.trans(Tensor): Translation matrix from camera to ego.intrins(Tensor): Intrinsic matrix.post_rots_ida(Tensor): Rotation matrix for ida.post_trans_ida(Tensor): Translation matrix for idapost_rot_bda(Tensor): Rotation matrix for bda."""# self.frustum:[112, 16, 44, 4] 視錐 points = self.frustum# 乘以圖像增強的逆矩陣points = ida_mat.inverse().matmul(points.unsqueeze(-1))# lamda * [x,y,1] = [lamda*x,lamda*y,lamda]# 像素坐標系轉相機坐標系points = torch.cat(...)# cam_to_egocombine = sensor2ego_mat.matmul(torch.inverse(intrin_mat))points = combine.view(...)return points # 對應Depth Module，由與論文中沒有給出該模塊的流程圖于是按照代碼邏輯繪制了一個 class DepthNet(nn.Module):def __init__(...):...def forward(...):# 當前幀的相機參數mlp_input = ...# Normmlp_input = self.bn(mlp_input.reshape(-1, mlp_input.shape[-1])) # 相機參數作為 context的注意力系數context_se = self.context_mlp(mlp_input)[..., None, None]# 注意力操作context = self.context_se(x, context_se)# FCcontext = self.context_conv(context)# 相機參數作為 Depth的注意力系數depth_se = self.depth_mlp(mlp_input)[..., None, None]# 注意力操作depth = self.depth_se(x, depth_se)# FCdepth = self.depth_conv(depth)return torch.cat([depth, context], dim=1)

Depth Module

3、bevdepth/ops/voxel_pooling/voxel_pooling.py

class VoxelPooling(...):def forward(...):"""Args:geom_xyz:在車輛坐標系下的視錐點，x、y軸的范圍為0~127 input_features:環視圖片特征voxel_num: 128 * 128 * 80"""# 為每個視錐點分配一個thread,將在bev特征下，處于相同位置的特征點對應的特征向量相加，具體可以看下方的核函數voxel_pooling_ext.voxel_pooling_forward_wrapper(...)# -> bevdepth/ops/voxel_pooling/src/voxel_pooling_forward_cuda.cu# 最終就得到BEV特征 output_featuresreturn output_features

4、bevdepth/ops/voxel_pooling/src/voxel_pooling_forward_cuda.cu
由于voxel_pooling代碼講解的資料比較少，根據對下面的代碼的理解繪制了voxel_pooling的示意圖，在下方的代碼注釋中會對這個圖進行說明。

void voxel_pooling_forward_kernel_launcher(...){dim3 blocks(DIVUP(batch_size * num_points, THREADS_PER_BLOCK)); // 473088 / 128 = 3696 個 block ，排布為 3696*1dim3 threads(THREADS_BLOCK_X, THREADS_BLOCK_Y); // 每個 block中有 128 個 thread，排布為 32 * 4voxel_pooling_forward_kernel<<<blocks, threads, 0, stream>>>(batch_size, num_points, num_channels, num_voxel_x, num_voxel_y,num_voxel_z, geom_xyz, input_features, output_features, pos_memo); }__global__ void voxel_pooling_forward_kernel(...) {/*Args:batch_size:當前block在哪個batch ，假定batchsize==1num_points:視錐點個數，473088num_channels:特征維度，80num_voxel_x:bev特征x大小num_voxel_y:bev特征y大小geom_xyz:視錐坐標的指針，[1, 473088, 3]input_features:輸入特征圖的指針，[1, 473088, 80]output_features:輸出特征圖的指針，[1, 128, 128, 80]pos_memo:記錄x,y坐標，[1, 473088, 3]*/# 所有thread 同時計算const int bidx = blockIdx.x; // bidx，當前block在當前grid中x維度的索引const int tidx = threadIdx.x; // tidx，當前thread在當前block中x維度的索引const int tidy = threadIdx.y; // tidy，當前thread在當前block中y維度的索引const int sample_dim = THREADS_PER_BLOCK; // sample_dim 128 ,每個block中的thread數量 const int idx_in_block = tidy * THREADS_BLOCK_X + tidx; // 當前thread在當前block中的全局索引const int block_sample_idx = bidx * sample_dim; //當前block在當前grid中的全局索引const int thread_sample_idx = block_sample_idx + idx_in_block; // 當前thread在當前grid中的全局索引const int total_samples = batch_size * num_points; // 總thread數量__shared__ int geom_xyz_shared[THREADS_PER_BLOCK * 3]; // 128 * 3 共享內存，記錄一個block中所有點的坐標if (thread_sample_idx < total_samples) {// 將一個block中的所有視錐點的坐儲存在共享內存geom_xyz_shared中，(所有block同時進行)const int sample_x = geom_xyz[thread_sample_idx * 3 + 0];const int sample_y = geom_xyz[thread_sample_idx * 3 + 1];const int sample_z = geom_xyz[thread_sample_idx * 3 + 2];geom_xyz_shared[idx_in_block * 3 + 0] = sample_x;geom_xyz_shared[idx_in_block * 3 + 1] = sample_y;geom_xyz_shared[idx_in_block * 3 + 2] = sample_z;if ((sample_x >= 0 && sample_x < num_voxel_x) &&(sample_y >= 0 && sample_y < num_voxel_y) &&(sample_z >= 0 && sample_z < num_voxel_z)) {pos_memo[thread_sample_idx * 3 + 0] = thread_smple_idx / num_points; // 將z軸變為0pos_memo[thread_sample_idx * 3 + 1] = sample_y; // 保存視錐y坐標pos_memo[thread_sample_idx * 3 + 2] = sample_x; // 保存視錐x坐標}}__syncthreads();// 可以分為兩個步驟，1、先找到當前視錐點在output_features,也就是BEV特征下索引，再找到當前視錐點在input_features中的索引，然后再將兩個位置的特征進行相加，由于input_features可能出現多個索引對應于output_features中的同一個索引，必須使用原子加 atomicAdd，可以參考上方的示意圖for (int i = tidy;i < THREADS_PER_BLOCK && block_sample_idx + i < total_samples;i += THREADS_BLOCK_Y) {const int sample_x = geom_xyz_shared[i * 3 + 0];const int sample_y = geom_xyz_shared[i * 3 + 1];const int sample_z = geom_xyz_shared[i * 3 + 2];if (sample_x < 0 || sample_x >= num_voxel_x || sample_y < 0 ||sample_y >= num_voxel_y || sample_z < 0 || sample_z >= num_voxel_z) {continue;}const int batch_idx = (block_sample_idx + i) / num_points;for (int j = tidx; j < num_channels; j += THREADS_BLOCK_X) {atomicAdd(&output_features[(batch_idx * num_voxel_y * num_voxel_x +sample_y * num_voxel_x + sample_x) *num_channels +j],input_features[(block_sample_idx + i) * num_channels + j]);}} }

4、總結

本次針對BEVDepth的特性進行學習，主要是針對深度預測模塊以及Voxel_pooling模塊進行了分析，了解完BEVDepth之后，就可以對曠視的另一篇論文BEVstereo進行學習了，希望有更多的人加入進來，一起學習、討論。

總結

以上是生活随笔為你收集整理的[BEV]学习笔记之BEVDepth（原理+代码）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：国家基础地理信息中心行政边界等矢量数据免
下一篇：丢失的相机照片怎么找回来的使用方法分享