R(2+1)D理解与MindSpore框架下的实现
一、R(2+1)D算法原理介紹
論文地址:[1711.11248] A Closer Look at Spatiotemporal Convolutions for Action Recognition (arxiv.org)
Tran等人在2018年發表在CVPR 的文章《A Closer Look at Spatiotemporal Convolutions for Action Recognition》提出了R(2+1)D,表明將三位卷積核分解為獨立的空間和時間分量可以顯著提高精度,R(2+1)D中的卷積模塊將 N×t×d×dN \times t \times d \times dN×t×d×d 的3D卷積拆分為 N×1×d×dN \times 1 \times d \times dN×1×d×d 的2D空間卷積和 M×t×1×1M \times t \times 1 \times 1M×t×1×1 的1D時間卷積,其中N和M為卷積核的個數,超參數M決定了信號在空間卷積和時間卷積之間投影的中間子空間的維數,論文中將M的值設置為:
Mi=?td2Ni?1Nid2Ni?1+tNi?M_{i}= \left \lfloor \frac{td^{2}N_{i-1}N_{i}}{d^{2}N_{i-1}+tN_{i}} \right \rfloor Mi?=?d2Ni?1?+tNi?td2Ni?1?Ni???
i表示殘差網絡中第i個卷積塊,通過這種方式以保證(2+1)D模塊中的參數量近似于3D卷積的參數量。
與全三維卷積相比,(2+1)D分解有兩個優點,首先,盡管沒有改變參數的數量,但由于每個塊中2D和1D卷積之間的額外激活函數,網絡中的非線性數量增加了一倍,非線性數量的增加了可以表示的函數的復雜性。第二個好處在于,將3D卷積強制轉換為單獨的空間和時間分量,使優化變得更容易,這表現在與相同參數量的3D卷積網絡相比,(2+1)D網絡的訓練誤差更低。
下表展示了18層和34層的R3D網絡的架構,在R3D中,使用(2+1)D卷積代替3D卷積就能得到對應層數的R(2+1)D網絡。
實驗部分在Kinetics 上比較了不同形式的卷積的動作識別準確性,如下表所示。所有模型都基于 ResNet18,并在 8 幀或 16 幀剪輯輸入上從頭開始訓練,結果表明R(2+1)D 的精度優于所有其他模型。
在Kinetics上與sota方法比較的結果如下表所示。當在 RGB 輸入上從頭開始訓練時,R(2+1)D 比 I3D 高出 4.5%,在 Sports-1M 上預訓練的 R(2+1)D 也比在 ImageNet 上預訓練的 I3D 高 2.2%。
二、R(2+1)D的mindspore代碼實現
功能函數說明
數據預處理
使用GeneratorDataset讀取了視頻數據集文件,輸出batch_size=16的指定幀數的三通道圖片。
數據前處理包括混洗、歸一化。
數據增強包括video_random_crop類實現的隨機裁剪、video_resize類實現的調整大小、video_random_horizontal_flip實現的隨機水平翻轉。
模型主干
R2Plus1d18中,輸入首先經過一個 (2+1)D卷積模塊,經過一個最大池化層,之后通過4個由(2+1)D卷積模塊組成的residual block,再經過平均池化層、展平層最后到全連接層。
最先的(2+1)D卷積模塊具體為卷積核大小為(1,7,7)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d,卷積層之間是Batch Normalization和Relu層。
R2Plus1d18中包含4個residual block,每個block在模型中都堆疊兩次,同時每個block都由兩個(2+1)D卷積模塊組成,每個(2+1)D卷積都由一個卷積核大小為(1,3,3)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d組成,卷積層之間仍然是Batch Normalization和Relu層,block的輸入和輸出之間是殘差連接的結構。
具體模型搭建中各個類的作用為:
- Unit3D類實現了輸入經過Conv3d、BN、Relu、Pooling層的結構,其中BN層、Relu層和Pooling層是可選的。
- Inflate3D類使用Unit3D實現了(2+1)D卷積模塊。
- Resnet3D類實現了輸入經過Unit3D、Max Pooling再接4個residual block的結構,residual block的堆疊數量可以通過參數進行指定。
- R2Plus1dNet類繼承了Resnet3D類,主要是使用了Resnet3D中的4個residual block,實現了輸入經過(2+1)D、Max Pooling,再通過4個residual block,最后經過平均池化層、展平層到全連接層的結構。
- R2Plus1d18類繼承了R2Plu1dNet類,主要的作用是指定residual block的堆疊次數,在此類中指定的數量即為每個block都堆疊兩次。
三、可執行案例
notebook文件鏈接
數據集準備
代碼倉庫使用 Kinetics400 數據集進行訓練和驗證。
預訓練模型
預訓練模型是在 kinetics400 數據集上訓練,下載地址:r2plus1d18_kinetic400.ckpt
環境準備
git clone https://gitee.com/yanlq46462828/zjut_mindvideo.git cd zjut_mindvideo# Please first install mindspore according to instructions on the official website: https://www.mindspore.cn/installpip install -r requirements.txt pip install -e .訓練流程
from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.context import ParallelMode from mindspore.communication import init, get_rank, get_group_size from mindspore.train import Model from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor from mindspore.nn.loss import SoftmaxCrossEntropyWithLogitsfrom msvideo.utils.check_param import Validator,Rel數據集加載
通過基于VideoDataset編寫的Kinetic400類來加載kinetic400數據集。
from msvideo.data.kinetics400 import Kinetic400 # Data Pipeline. dataset = Kinetic400(path='/home/publicfile/kinetics-400',split="train",seq=32,num_parallel_workers=1,shuffle=True,batch_size=6,repeat_num=1) ckpt_save_dir = './r2plus1d' /home/publicfile/kinetics-400/cls2index.json數據處理
通過VideoRescale對視頻進行縮放,利用VideoResize改變大小,再用VideoRandomCrop對Resize后的視頻進行隨機裁剪,再用VideoRandomHorizontalFlip根據概率對視頻進行水平翻轉,利用VideoReOrder對維度進行變換,再用VideoNormalize進行歸一化處理。
from msvideo.data.transforms import VideoRandomCrop, VideoRandomHorizontalFlip, VideoRescale from msvideo.data.transforms import VideoNormalize, VideoResize, VideoReOrdertransforms = [VideoRescale(shift=0.0),VideoResize([128, 171]),VideoRandomCrop([112, 112]),VideoRandomHorizontalFlip(0.5),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset.transform = transforms dataset_train = dataset.run() Validator.check_int(dataset_train.get_dataset_size(), 0, Rel.GT) step_size = dataset_train.get_dataset_size() [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:30:59.929.412 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.網絡構建
from msvideo.models.r2plus1d import R2Plus1d18 # Create model network = R2Plus1d18(num_classes=400) from msvideo.schedule.lr_schedule import warmup_cosine_annealing_lr_v1 # Set learning rate scheduler. learning_rate = warmup_cosine_annealing_lr_v1(lr=0.01,steps_per_epoch=step_size,warmup_epochs=4,max_epoch=100,t_max=100,eta_min=0) # Define optimizer. network_opt = nn.Momentum(network.trainable_params(),learning_rate=learning_rate,momentum=0.9,weight_decay=0.00004) # Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") # Set the checkpoint config for the network. ckpt_config = CheckpointConfig(save_checkpoint_steps=step_size,keep_checkpoint_max=10) ckpt_callback = ModelCheckpoint(prefix='r2plus1d_kinetics400',directory=ckpt_save_dir,config=ckpt_config) # Init the model. model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics={'acc'}) # Begin to train. print('[Start training `{}`]'.format('r2plus1d_kinetics400')) print("=" * 80) model.train(1,dataset_train,callbacks=[ckpt_callback, LossMonitor()],dataset_sink_mode=False) print('[End of training `{}`]'.format('r2plus1d_kinetics400')) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.490.637 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.498.663 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start training `r2plus1d_kinetics400`] ================================================================================ epoch: 1 step: 1, loss is 5.998835563659668 epoch: 1 step: 2, loss is 5.921803951263428 epoch: 1 step: 3, loss is 6.024421691894531 epoch: 1 step: 4, loss is 6.08278751373291 epoch: 1 step: 5, loss is 6.014780044555664 epoch: 1 step: 6, loss is 5.945815086364746 epoch: 1 step: 7, loss is 6.078174114227295 epoch: 1 step: 8, loss is 6.0565361976623535 epoch: 1 step: 9, loss is 5.952683448791504 epoch: 1 step: 10, loss is 6.033120632171631 epoch: 1 step: 11, loss is 6.05575704574585 epoch: 1 step: 12, loss is 5.9879350662231445 epoch: 1 step: 13, loss is 6.006839275360107 epoch: 1 step: 14, loss is 5.9968180656433105 epoch: 1 step: 15, loss is 5.971335411071777 epoch: 1 step: 16, loss is 6.0620856285095215 epoch: 1 step: 17, loss is 6.081112861633301 epoch: 1 step: 18, loss is 6.106649398803711 epoch: 1 step: 19, loss is 6.095144271850586 epoch: 1 step: 20, loss is 6.00246000289917 epoch: 1 step: 21, loss is 6.061524868011475 epoch: 1 step: 22, loss is 6.046009063720703 epoch: 1 step: 23, loss is 5.997835159301758 epoch: 1 step: 24, loss is 6.007784366607666 epoch: 1 step: 25, loss is 5.946590423583984 epoch: 1 step: 26, loss is 5.9461164474487305 epoch: 1 step: 27, loss is 5.9034929275512695 epoch: 1 step: 28, loss is 5.925591945648193 epoch: 1 step: 29, loss is 6.176599979400635 ......評估流程
from mindspore import context from msvideo.data.kinetics400 import Kinetic400context.set_context(mode=context.GRAPH_MODE, device_target="GPU")# Data Pipeline. dataset_eval = Kinetic400("/home/publicfile/kinetics-400",split="val",seq=32,seq_mode="interval",num_parallel_workers=1,shuffle=False,batch_size=8,repeat_num=1) /home/publicfile/kinetics-400/cls2index.json from msvideo.data.transforms import VideoCenterCrop, VideoRescale, VideoReOrder from msvideo.data.transforms import VideoNormalize, VideoResizetransforms = [VideoResize([128, 171]),VideoRescale(shift=0.0),VideoCenterCrop([112, 112]),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset_eval.transform = transforms dataset_eval = dataset_eval.run() from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.train import Model from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits from msvideo.utils.callbacks import EvalLossMonitor from msvideo.models.r2plus1d import R2Plus1d18# Create model network = R2Plus1d18(num_classes=400)# Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")param_dict = load_checkpoint('/home/zhengs/r2plus1d/r2plus1d18_kinetic400.ckpt') load_param_into_net(network, param_dict)# Define eval_metrics. eval_metrics = {'Loss': nn.Loss(),'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}# Init the model. model = Model(network, loss_fn=network_loss, metrics=eval_metrics)print_cb = EvalLossMonitor(model) # Begin to eval. print('[Start eval `{}`]'.format('r2plus1d_kinetics400')) result = model.eval(dataset_eval,callbacks=[print_cb],dataset_sink_mode=False) print(result) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.745.627 [mindspore/train/model.py:1077] For EvalLossMonitor callback, {'epoch_end', 'step_end', 'epoch_begin', 'step_begin'} methods may not be supported in later version, Use methods prefixed with 'on_train' or 'on_eval' instead when using customized callbacks. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.747.418 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.749.293 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.751.452 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start eval `r2plus1d_kinetics400`] step:[ 1/ 2484], metrics:[], loss:[3.070/3.070], time:1923.473 ms, step:[ 2/ 2484], metrics:['Loss: 3.0702', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.7500'], loss:[0.808/1.939], time:169.314 ms, step:[ 3/ 2484], metrics:['Loss: 1.9391', 'Top_1_Accuracy: 0.5625', 'Top_5_Accuracy: 0.8750'], loss:[2.645/2.175], time:192.965 ms, step:[ 4/ 2484], metrics:['Loss: 2.1745', 'Top_1_Accuracy: 0.5417', 'Top_5_Accuracy: 0.8750'], loss:[2.954/2.369], time:172.657 ms, step:[ 5/ 2484], metrics:['Loss: 2.3695', 'Top_1_Accuracy: 0.5000', 'Top_5_Accuracy: 0.8438'], loss:[2.489/2.393], time:176.803 ms, step:[ 6/ 2484], metrics:['Loss: 2.3934', 'Top_1_Accuracy: 0.4750', 'Top_5_Accuracy: 0.8250'], loss:[1.566/2.256], time:172.621 ms, step:[ 7/ 2484], metrics:['Loss: 2.2556', 'Top_1_Accuracy: 0.4792', 'Top_5_Accuracy: 0.8333'], loss:[0.761/2.042], time:172.149 ms, step:[ 8/ 2484], metrics:['Loss: 2.0420', 'Top_1_Accuracy: 0.5357', 'Top_5_Accuracy: 0.8571'], loss:[3.675/2.246], time:181.757 ms, step:[ 9/ 2484], metrics:['Loss: 2.2461', 'Top_1_Accuracy: 0.4688', 'Top_5_Accuracy: 0.7969'], loss:[3.909/2.431], time:186.722 ms, step:[ 10/ 2484], metrics:['Loss: 2.4309', 'Top_1_Accuracy: 0.4583', 'Top_5_Accuracy: 0.7639'], loss:[3.663/2.554], time:199.209 ms, step:[ 11/ 2484], metrics:['Loss: 2.5542', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7375'], loss:[3.438/2.635], time:173.766 ms, step:[ 12/ 2484], metrics:['Loss: 2.6345', 'Top_1_Accuracy: 0.4318', 'Top_5_Accuracy: 0.7159'], loss:[2.695/2.640], time:171.364 ms, step:[ 13/ 2484], metrics:['Loss: 2.6395', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7292'], loss:[3.542/2.709], time:172.889 ms, step:[ 14/ 2484], metrics:['Loss: 2.7090', 'Top_1_Accuracy: 0.4231', 'Top_5_Accuracy: 0.7308'], loss:[3.404/2.759], time:216.287 ms, step:[ 15/ 2484], metrics:['Loss: 2.7586', 'Top_1_Accuracy: 0.4018', 'Top_5_Accuracy: 0.7232'], loss:[4.012/2.842], time:171.686 ms, step:[ 16/ 2484], metrics:['Loss: 2.8422', 'Top_1_Accuracy: 0.3833', 'Top_5_Accuracy: 0.7167'], loss:[5.157/2.987], time:170.363 ms, step:[ 17/ 2484], metrics:['Loss: 2.9869', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.6875'], loss:[4.667/3.086], time:171.926 ms, step:[ 18/ 2484], metrics:['Loss: 3.0857', 'Top_1_Accuracy: 0.3603', 'Top_5_Accuracy: 0.6618'], loss:[5.044/3.194], time:197.028 ms, step:[ 19/ 2484], metrics:['Loss: 3.1945', 'Top_1_Accuracy: 0.3403', 'Top_5_Accuracy: 0.6458'], loss:[3.625/3.217], time:222.758 ms, step:[ 20/ 2484], metrics:['Loss: 3.2171', 'Top_1_Accuracy: 0.3355', 'Top_5_Accuracy: 0.6513'], loss:[1.909/3.152], time:207.416 ms, step:[ 21/ 2484], metrics:['Loss: 3.1517', 'Top_1_Accuracy: 0.3563', 'Top_5_Accuracy: 0.6625'], loss:[4.591/3.220], time:171.645 ms, step:[ 22/ 2484], metrics:['Loss: 3.2202', 'Top_1_Accuracy: 0.3631', 'Top_5_Accuracy: 0.6667'], loss:[3.545/3.235], time:209.975 ms, step:[ 23/ 2484], metrics:['Loss: 3.2350', 'Top_1_Accuracy: 0.3693', 'Top_5_Accuracy: 0.6591'], loss:[3.350/3.240], time:185.889 ms,Code
代碼倉庫地址如下:
Gitee地址
Github地址
總結
以上是生活随笔為你收集整理的R(2+1)D理解与MindSpore框架下的实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: PHP基础——相册管理系统的实现
- 下一篇: Teamviewer11现在无法捕捉屏幕