當前位置：首頁 > 人工智能 > 目标检测 >内容正文

目标检测

Keras搭建YoloV4目标检测平台

發布時間：2025/3/21 目标检测 185 豆豆

生活随笔收集整理的這篇文章主要介紹了 Keras搭建YoloV4目标检测平台小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

學習前言
什么是YOLOV4
代碼下載
YOLOV4改進的部分（不完全）
YOLOV4結構解析
1、主干特征提取網絡Backbone
2、特征金字塔
3、YoloHead利用獲得到的特征進行預測
4、預測結果的解碼
5、在原圖上進行繪制
YOLOV4的訓練
1、YOLOV4的改進訓練技巧
a)、Mosaic數據增強
b)、Label Smoothing平滑
c)、CIOU
d)、學習率余弦退火衰減
2、loss組成
a)、計算loss所需參數
b)、y_pre是什么
c)、y_true是什么。
d)、loss的計算過程
訓練自己的YOLOV4模型
學習前言
哈哈哈我最喜歡的YOLO更新了！

什么是YOLOV4

YOLOV4是YOLOV3的改進版，在YOLOV3的基礎上結合了非常多的小Tricks。
盡管沒有目標檢測上革命性的改變，但是YOLOV4依然很好的結合了速度與精度。
根據上圖也可以看出來，YOLOV4在YOLOV3的基礎上，在FPS不下降的情況下，mAP達到了44，提高非常明顯。

YOLOV4整體上的檢測思路和YOLOV3相比相差并不大，都是使用三個特征層進行分類與回歸預測。

請注意！

強烈建議在學習YOLOV4之前學習YOLOV3，因為YOLOV4確實可以看作是YOLOV3結合一系列改進的版本！

（重要的事情說三遍！）

YOLOV3可參考該博客：
https://blog.csdn.net/weixin_44791964/article/details/103276106

代碼下載
https://github.com/bubbliiiing/yolov4-keras
喜歡的可以給個star噢！

YOLOV4改進的部分（不完全）
1、主干特征提取網絡：DarkNet53 => CSPDarkNet53

2、特征金字塔：SPP，PAN

3、分類回歸層：YOLOv3（未改變）

4、訓練用到的小技巧：Mosaic數據增強、Label Smoothing平滑、CIOU、學習率余弦退火衰減

5、激活函數：使用Mish激活函數

以上并非全部的改進部分，還存在一些其它的改進，由于YOLOV4使用的改進實在太多了，很難完全實現與列出來，這里只列出來了一些我比較感興趣，而且非常有效的改進。

整篇BLOG會結合YOLOV3與YOLOV4的差別進行解析

YOLOV4結構解析
1、主干特征提取網絡Backbone
當輸入是416x416時，特征結構如下：

當輸入是608x608時，特征結構如下：

主干特征提取網絡Backbone的改進點有兩個：
a).主干特征提取網絡：DarkNet53 => CSPDarkNet53
b).激活函數：使用Mish激活函數

如果大家對YOLOV3比較熟悉的話，應該知道Darknet53的結構，其由一系列殘差網絡結構構成。在Darknet53中，其存在如下resblock_body模塊，其由一次下采樣和多次殘差結構的堆疊構成，Darknet53便是由resblock_body模塊組合而成。

def resblock_body(x, num_filters, num_blocks):
? ? x = ZeroPadding2D(((1,0),(1,0)))(x)
? ? x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x)
? ? for i in range(num_blocks):
? ? ? ? y = DarknetConv2D_BN_Leaky(num_filters//2, (1,1))(x)
? ? ? ? y = DarknetConv2D_BN_Leaky(num_filters, (3,3))(y)
? ? ? ? x = Add()([x,y])
? ? return x
1
2
3
4
5
6
7
8
而在YOLOV4中，其對該部分進行了一定的修改。
1、其一是將DarknetConv2D的激活函數由LeakyReLU修改成了Mish，卷積塊由DarknetConv2D_BN_Leaky變成了DarknetConv2D_BN_Mish。
Mish函數的公式與圖像如下：
Mish=x×tanh(ln(1+ex))Mish=x \times tanh(ln(1+e^x))
Mish=x×tanh(ln(1+e?
x
?))

2、其二是將resblock_body的結構進行修改，使用了CSPnet結構。此時YOLOV4當中的Darknet53被修改成了CSPDarknet53。

CSPnet結構并不算復雜，就是將原來的殘差塊的堆疊進行了一個拆分，拆成左右兩部分：
主干部分繼續進行原來的殘差塊的堆疊；
另一部分則像一個殘差邊一樣，經過少量處理直接連接到最后。
因此可以認為CSP中存在一個大的殘差邊。

#---------------------------------------------------#
# ? CSPdarknet的結構塊
# ? 存在一個大殘差邊
# ? 這個大殘差邊繞過了很多的殘差結構
#---------------------------------------------------#
def resblock_body(x, num_filters, num_blocks, all_narrow=True):
? ? # 進行長和寬的壓縮
? ? preconv1 = ZeroPadding2D(((1,0),(1,0)))(x)
? ? preconv1 = DarknetConv2D_BN_Mish(num_filters, (3,3), strides=(2,2))(preconv1)

? ? # 生成一個大的殘差邊?
? ? shortconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(preconv1)

? ? # 主干部分的卷積
? ? mainconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(preconv1)
? ? # 1x1卷積對通道數進行整合->3x3卷積提取特征，使用殘差結構
? ? for i in range(num_blocks):
? ? ? ? y = compose(
? ? ? ? ? ? ? ? DarknetConv2D_BN_Mish(num_filters//2, (1,1)),
? ? ? ? ? ? ? ? DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (3,3)))(mainconv)
? ? ? ? mainconv = Add()([mainconv,y])
? ? # 1x1卷積后和殘差邊堆疊
? ? postconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(mainconv)
? ? route = Concatenate()([postconv, shortconv])

? ? # 最后對通道數進行整合
? ? return DarknetConv2D_BN_Mish(num_filters, (1,1))(route)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
全部實現代碼為：

from functools import wraps
from keras import backend as K
from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D, Layer
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.normalization import BatchNormalization
from keras.regularizers import l2
from utils.utils import compose

class Mish(Layer):
? ? def __init__(self, **kwargs):
? ? ? ? super(Mish, self).__init__(**kwargs)
? ? ? ? self.supports_masking = True

? ? def call(self, inputs):
? ? ? ? return inputs * K.tanh(K.softplus(inputs))

? ? def get_config(self):
? ? ? ? config = super(Mish, self).get_config()
? ? ? ? return config

? ? def compute_output_shape(self, input_shape):
? ? ? ? return input_shape
#--------------------------------------------------#
# ? 單次卷積
#--------------------------------------------------#
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
? ? darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
? ? darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
? ? darknet_conv_kwargs.update(kwargs)
? ? return Conv2D(*args, **darknet_conv_kwargs)

#---------------------------------------------------#
# ? 卷積塊
# ? DarknetConv2D + BatchNormalization + Mish
#---------------------------------------------------#
def DarknetConv2D_BN_Mish(*args, **kwargs):
? ? no_bias_kwargs = {'use_bias': False}
? ? no_bias_kwargs.update(kwargs)
? ? return compose(
? ? ? ? DarknetConv2D(*args, **no_bias_kwargs),
? ? ? ? BatchNormalization(),
? ? ? ? Mish())

? ? # 生成一個大的殘差邊?
? ? shortconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(preconv1)

? ? # 最后對通道數進行整合
? ? return DarknetConv2D_BN_Mish(num_filters, (1,1))(route)

#---------------------------------------------------#
# ? CSPdarknet53 的主體部分
#---------------------------------------------------#
def darknet_body(x):
? ? x = DarknetConv2D_BN_Mish(32, (3,3))(x)
? ? x = resblock_body(x, 64, 1, False)
? ? x = resblock_body(x, 128, 2)
? ? x = resblock_body(x, 256, 8)
? ? feat1 = x
? ? x = resblock_body(x, 512, 8)
? ? feat2 = x
? ? x = resblock_body(x, 1024, 4)
? ? feat3 = x
? ? return feat1,feat2,feat3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
2、特征金字塔
當輸入是416x416時，特征結構如下：

當輸入是608x608時，特征結構如下：

在特征金字塔部分，YOLOV4結合了兩種改進:
a).使用了SPP結構。
b).使用了PANet結構。
如上圖所示，除去CSPDarknet53和Yolo Head的結構外，都是特征金字塔的結構。
1、SPP結構參雜在對CSPdarknet53的最后一個特征層的卷積里，在對CSPdarknet53的最后一個特征層進行三次DarknetConv2D_BN_Leaky卷積后，分別利用四個不同尺度的最大池化進行處理，最大池化的池化核大小分別為13x13、9x9、5x5、1x1（1x1即無處理）

# 使用了SPP結構，即不同尺度的最大池化后堆疊。
maxpool1 = MaxPooling2D(pool_size=(13,13), strides=(1,1), padding='same')(P5)
maxpool2 = MaxPooling2D(pool_size=(9,9), strides=(1,1), padding='same')(P5)
maxpool3 = MaxPooling2D(pool_size=(5,5), strides=(1,1), padding='same')(P5)
P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5])
1
2
3
4
5
其可以它能夠極大地增加感受野，分離出最顯著的上下文特征。

2、PANet是2018的一種實例分割算法，其具體結構由反復提升特征的意思。

上圖為原始的PANet的結構，可以看出來其具有一個非常重要的特點就是特征的反復提取。
在（a）里面是傳統的特征金字塔結構，在完成特征金字塔從下到上的特征提取后，還需要實現（b）中從上到下的特征提取。

而在YOLOV4當中，其主要是在三個有效特征層上使用了PANet結構。

實現代碼如下：

#---------------------------------------------------#
# ? 特征層->最后的輸出
#---------------------------------------------------#
def yolo_body(inputs, num_anchors, num_classes):
? ? # 生成darknet53的主干模型
? ? feat1,feat2,feat3 = darknet_body(inputs)

? ? P5 = DarknetConv2D_BN_Leaky(512, (1,1))(feat3)
? ? P5 = DarknetConv2D_BN_Leaky(1024, (3,3))(P5)
? ? P5 = DarknetConv2D_BN_Leaky(512, (1,1))(P5)
? ? # 使用了SPP結構，即不同尺度的最大池化后堆疊。
? ? maxpool1 = MaxPooling2D(pool_size=(13,13), strides=(1,1), padding='same')(P5)
? ? maxpool2 = MaxPooling2D(pool_size=(9,9), strides=(1,1), padding='same')(P5)
? ? maxpool3 = MaxPooling2D(pool_size=(5,5), strides=(1,1), padding='same')(P5)
? ? P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5])
? ? P5 = DarknetConv2D_BN_Leaky(512, (1,1))(P5)
? ? P5 = DarknetConv2D_BN_Leaky(1024, (3,3))(P5)
? ? P5 = DarknetConv2D_BN_Leaky(512, (1,1))(P5)

? ? P5_upsample = compose(DarknetConv2D_BN_Leaky(256, (1,1)), UpSampling2D(2))(P5)
? ??
? ? P4 = DarknetConv2D_BN_Leaky(256, (1,1))(feat2)
? ? P4 = Concatenate()([P4, P5_upsample])
? ? P4 = make_five_convs(P4,256)

? ? P4_upsample = compose(DarknetConv2D_BN_Leaky(128, (1,1)), UpSampling2D(2))(P4)
? ??
? ? P3 = DarknetConv2D_BN_Leaky(128, (1,1))(feat1)
? ? P3 = Concatenate()([P3, P4_upsample])
? ? P3 = make_five_convs(P3,128)

? ? # 76x76的out
? ? P3_output = DarknetConv2D_BN_Leaky(256, (3,3))(P3)
? ? P3_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P3_output)

? ? P3_downsample = ZeroPadding2D(((1,0),(1,0)))(P3)
? ? P3_downsample = DarknetConv2D_BN_Leaky(256, (3,3), strides=(2,2))(P3_downsample)
? ? P4 = Concatenate()([P3_downsample, P4])
? ? P4 = make_five_convs(P4,256)
? ??
? ? # 38x38的out
? ? P4_output = DarknetConv2D_BN_Leaky(512, (3,3))(P4)
? ? P4_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P4_output)
? ??

? ? P4_downsample = ZeroPadding2D(((1,0),(1,0)))(P4)
? ? P4_downsample = DarknetConv2D_BN_Leaky(512, (3,3), strides=(2,2))(P4_downsample)
? ? P5 = Concatenate()([P4_downsample, P5])
? ? P5 = make_five_convs(P5,512)
? ??
? ? # 19x19的out
? ? P5_output = DarknetConv2D_BN_Leaky(1024, (3,3))(P5)
? ? P5_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P5_output)

? ? return Model(inputs, [P5_output, P4_output, P3_output])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
3、YoloHead利用獲得到的特征進行預測
當輸入是416x416時，特征結構如下：

當輸入是608x608時，特征結構如下：

1、在特征利用部分，YoloV4提取多特征層進行目標檢測，一共提取三個特征層，分別位于中間層，中下層，底層，三個特征層的shape分別為(76,76,256)、(38,38,512)、(19,19,1024)。

2、輸出層的shape分別為(19,19,75)，(38,38,75)，(76,76,75)，最后一個維度為75是因為該圖是基于voc數據集的，它的類為20種，YoloV4只有針對每一個特征層存在3個先驗框，所以最后維度為3x25；
如果使用的是coco訓練集，類則為80種，最后的維度應該為255 = 3x85，三個特征層的shape為(19,19,255)，(38,38,255)，(76,76,255)

實現代碼如下：

#---------------------------------------------------#
# ? 特征層->最后的輸出
#---------------------------------------------------#
def yolo_body(inputs, num_anchors, num_classes):
# 省略了一部分，只看最后的head部分
? ? P3_output = DarknetConv2D_BN_Leaky(256, (3,3))(P3)
? ? P3_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P3_output)

? ? P4_output = DarknetConv2D_BN_Leaky(512, (3,3))(P4)
? ? P4_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P4_output)

? ? P5_output = DarknetConv2D_BN_Leaky(1024, (3,3))(P5)
? ? P5_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P5_output)
1
2
3
4
5
6
7
8
9
10
11
12
13
4、預測結果的解碼
由第二步我們可以獲得三個特征層的預測結果，shape分別為(N,19,19,255)，(N,38,38,255)，(N,76,76,255)的數據，對應每個圖分為19x19、38x38、76x76的網格上3個預測框的位置。

但是這個預測結果并不對應著最終的預測框在圖片上的位置，還需要解碼才可以完成。

此處要講一下yolo3的預測原理，yolo3的3個特征層分別將整幅圖分為19x19、38x38、76x76的網格，每個網絡點負責一個區域的檢測。

我們知道特征層的預測結果對應著三個預測框的位置，我們先將其reshape一下，其結果為(N,19,19,3,85)，(N,38,38,3,85)，(N,76,76,3,85)。

最后一個維度中的85包含了4+1+80，分別代表x_offset、y_offset、h和w、置信度、分類結果。

yolo3的解碼過程就是將每個網格點加上它對應的x_offset和y_offset，加完后的結果就是預測框的中心，然后再利用先驗框和h、w結合計算出預測框的長和寬。這樣就能得到整個預測框的位置了。

當然得到最終的預測結構后還要進行得分排序與非極大抑制篩選
這一部分基本上是所有目標檢測通用的部分。不過該項目的處理方式與其它項目不同。其對于每一個類進行判別。
1、取出每一類得分大于self.obj_threshold的框和得分。
2、利用框的位置和得分進行非極大抑制。

實現代碼如下，當調用yolo_eval時，就會對每個特征層進行解碼：

#---------------------------------------------------#
# ? 將預測值的每個特征層調成真實值
#---------------------------------------------------#
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
? ? num_anchors = len(anchors)
? ? # [1, 1, 1, num_anchors, 2]
? ? anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])

? ? # 獲得x，y的網格
? ? # (19, 19, 1, 2)
? ? grid_shape = K.shape(feats)[1:3] # height, width
? ? grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
? ? ? ? [1, grid_shape[1], 1, 1])
? ? grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
? ? ? ? [grid_shape[0], 1, 1, 1])
? ? grid = K.concatenate([grid_x, grid_y])
? ? grid = K.cast(grid, K.dtype(feats))

? ? # (batch_size,19,19,3,85)
? ? feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])

? ? # 將預測值調成真實值
? ? # box_xy對應框的中心點
? ? # box_wh對應框的寬和高
? ? box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
? ? box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
? ? box_confidence = K.sigmoid(feats[..., 4:5])
? ? box_class_probs = K.sigmoid(feats[..., 5:])

? ? # 在計算loss的時候返回如下參數
? ? if calc_loss == True:
? ? ? ? return grid, feats, box_xy, box_wh
? ? return box_xy, box_wh, box_confidence, box_class_probs

#---------------------------------------------------#
# ? 對box進行調整，使其符合真實圖片的樣子
#---------------------------------------------------#
def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
? ? box_yx = box_xy[..., ::-1]
? ? box_hw = box_wh[..., ::-1]
? ??
? ? input_shape = K.cast(input_shape, K.dtype(box_yx))
? ? image_shape = K.cast(image_shape, K.dtype(box_yx))

? ? new_shape = K.round(image_shape * K.min(input_shape/image_shape))
? ? offset = (input_shape-new_shape)/2./input_shape
? ? scale = input_shape/new_shape

? ? box_yx = (box_yx - offset) * scale
? ? box_hw *= scale

? ? box_mins = box_yx - (box_hw / 2.)
? ? box_maxes = box_yx + (box_hw / 2.)
? ? boxes = ?K.concatenate([
? ? ? ? box_mins[..., 0:1], ?# y_min
? ? ? ? box_mins[..., 1:2], ?# x_min
? ? ? ? box_maxes[..., 0:1], ?# y_max
? ? ? ? box_maxes[..., 1:2] ?# x_max
? ? ])

? ? boxes *= K.concatenate([image_shape, image_shape])
? ? return boxes

#---------------------------------------------------#
# ? 獲取每個box和它的得分
#---------------------------------------------------#
def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
? ? # 將預測值調成真實值
? ? # box_xy對應框的中心點
? ? # box_wh對應框的寬和高
? ? # -1,19,19,3,2; -1,19,19,3,2; -1,19,19,3,1; -1,19,19,3,80
? ? box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, anchors, num_classes, input_shape)
? ? # 將box_xy、和box_wh調節成y_min,y_max,xmin,xmax
? ? boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
? ? # 獲得得分和box
? ? boxes = K.reshape(boxes, [-1, 4])
? ? box_scores = box_confidence * box_class_probs
? ? box_scores = K.reshape(box_scores, [-1, num_classes])
? ? return boxes, box_scores

#---------------------------------------------------#
# ? 圖片預測
#---------------------------------------------------#
def yolo_eval(yolo_outputs,
? ? ? ? ? ? ? anchors,
? ? ? ? ? ? ? num_classes,
? ? ? ? ? ? ? image_shape,
? ? ? ? ? ? ? max_boxes=20,
? ? ? ? ? ? ? score_threshold=.6,
? ? ? ? ? ? ? iou_threshold=.5):
? ? # 獲得特征層的數量
? ? num_layers = len(yolo_outputs)
? ? # 特征層1對應的anchor是678
? ? # 特征層2對應的anchor是345
? ? # 特征層3對應的anchor是012
? ? anchor_mask = [[6,7,8], [3,4,5], [0,1,2]]
? ??
? ? input_shape = K.shape(yolo_outputs[0])[1:3] * 32
? ? boxes = []
? ? box_scores = []
? ? # 對每個特征層進行處理
? ? for l in range(num_layers):
? ? ? ? _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
? ? ? ? boxes.append(_boxes)
? ? ? ? box_scores.append(_box_scores)
? ? # 將每個特征層的結果進行堆疊
? ? boxes = K.concatenate(boxes, axis=0)
? ? box_scores = K.concatenate(box_scores, axis=0)

? ? mask = box_scores >= score_threshold
? ? max_boxes_tensor = K.constant(max_boxes, dtype='int32')
? ? boxes_ = []
? ? scores_ = []
? ? classes_ = []
? ? for c in range(num_classes):
? ? ? ? # 取出所有box_scores >= score_threshold的框，和成績
? ? ? ? class_boxes = tf.boolean_mask(boxes, mask[:, c])
? ? ? ? class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])

? ? ? ? # 非極大抑制，去掉box重合程度高的那一些
? ? ? ? nms_index = tf.image.non_max_suppression(
? ? ? ? ? ? class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)

? ? ? ? # 獲取非極大抑制后的結果
? ? ? ? # 下列三個分別是
? ? ? ? # 框的位置，得分與種類
? ? ? ? class_boxes = K.gather(class_boxes, nms_index)
? ? ? ? class_box_scores = K.gather(class_box_scores, nms_index)
? ? ? ? classes = K.ones_like(class_box_scores, 'int32') * c
? ? ? ? boxes_.append(class_boxes)
? ? ? ? scores_.append(class_box_scores)
? ? ? ? classes_.append(classes)
? ? boxes_ = K.concatenate(boxes_, axis=0)
? ? scores_ = K.concatenate(scores_, axis=0)
? ? classes_ = K.concatenate(classes_, axis=0)

? ? return boxes_, scores_, classes_
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
5、在原圖上進行繪制
通過第四步，我們可以獲得預測框在原圖上的位置，而且這些預測框都是經過篩選的。這些篩選后的框可以直接繪制在圖片上，就可以獲得結果了。

YOLOV4的訓練
1、YOLOV4的改進訓練技巧
a)、Mosaic數據增強
Yolov4的mosaic數據增強參考了CutMix數據增強方式，理論上具有一定的相似性！
CutMix數據增強方式利用兩張圖片進行拼接。

但是mosaic利用了四張圖片，根據論文所說其擁有一個巨大的優點是豐富檢測物體的背景！且在BN計算的時候一下子會計算四張圖片的數據！
就像下圖這樣：

實現思路如下：
1、每次讀取四張圖片。

2、分別對四張圖片進行翻轉、縮放、色域變化等，并且按照四個方向位置擺好。

3、進行圖片的組合和框的組合

def rand(a=0, b=1):
? ? return np.random.rand()*(b-a) + a

def merge_bboxes(bboxes, cutx, cuty):

? ? merge_bbox = []
? ? for i in range(len(bboxes)):
? ? ? ? for box in bboxes[i]:
? ? ? ? ? ? tmp_box = []
? ? ? ? ? ? x1,y1,x2,y2 = box[0], box[1], box[2], box[3]

? ? ? ? ? ? if i == 0:
? ? ? ? ? ? ? ? if y1 > cuty or x1 > cutx:
? ? ? ? ? ? ? ? ? ? continue
? ? ? ? ? ? ? ? if y2 >= cuty and y1 <= cuty:
? ? ? ? ? ? ? ? ? ? y2 = cuty
? ? ? ? ? ? ? ? ? ? if y2-y1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue
? ? ? ? ? ? ? ? if x2 >= cutx and x1 <= cutx:
? ? ? ? ? ? ? ? ? ? x2 = cutx
? ? ? ? ? ? ? ? ? ? if x2-x1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue
? ? ? ? ? ? ? ??
? ? ? ? ? ? if i == 1:
? ? ? ? ? ? ? ? if y2 < cuty or x1 > cutx:
? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? ? ? if y2 >= cuty and y1 <= cuty:
? ? ? ? ? ? ? ? ? ? y1 = cuty
? ? ? ? ? ? ? ? ? ? if y2-y1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue
? ? ? ? ? ? ? ??
? ? ? ? ? ? ? ? if x2 >= cutx and x1 <= cutx:
? ? ? ? ? ? ? ? ? ? x2 = cutx
? ? ? ? ? ? ? ? ? ? if x2-x1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? if i == 2:
? ? ? ? ? ? ? ? if y2 < cuty or x2 < cutx:
? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? ? ? if y2 >= cuty and y1 <= cuty:
? ? ? ? ? ? ? ? ? ? y1 = cuty
? ? ? ? ? ? ? ? ? ? if y2-y1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? ? ? if x2 >= cutx and x1 <= cutx:
? ? ? ? ? ? ? ? ? ? x1 = cutx
? ? ? ? ? ? ? ? ? ? if x2-x1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? if i == 3:
? ? ? ? ? ? ? ? if y1 > cuty or x2 < cutx:
? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? ? ? if y2 >= cuty and y1 <= cuty:
? ? ? ? ? ? ? ? ? ? y2 = cuty
? ? ? ? ? ? ? ? ? ? if y2-y1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? ? ? if x2 >= cutx and x1 <= cutx:
? ? ? ? ? ? ? ? ? ? x1 = cutx
? ? ? ? ? ? ? ? ? ? if x2-x1 < 5:
? ? ? ? ? ? ? ? ? ? ? ? continue

? ? ? ? ? ? tmp_box.append(x1)
? ? ? ? ? ? tmp_box.append(y1)
? ? ? ? ? ? tmp_box.append(x2)
? ? ? ? ? ? tmp_box.append(y2)
? ? ? ? ? ? tmp_box.append(box[-1])
? ? ? ? ? ? merge_bbox.append(tmp_box)
? ? return merge_bbox

def get_random_data(annotation_line, input_shape, random=True, hue=.1, sat=1.5, val=1.5, proc_img=True):
? ? '''random preprocessing for real-time data augmentation'''
? ? h, w = input_shape
? ? min_offset_x = 0.4
? ? min_offset_y = 0.4
? ? scale_low = 1-min(min_offset_x,min_offset_y)
? ? scale_high = scale_low+0.2

? ? image_datas = []?
? ? box_datas = []
? ? index = 0

? ? place_x = [0,0,int(w*min_offset_x),int(w*min_offset_x)]
? ? place_y = [0,int(h*min_offset_y),int(w*min_offset_y),0]
? ? for line in annotation_line:
? ? ? ? # 每一行進行分割
? ? ? ? line_content = line.split()
? ? ? ? # 打開圖片
? ? ? ? image = Image.open(line_content[0])
? ? ? ? image = image.convert("RGB")?
? ? ? ? # 圖片的大小
? ? ? ? iw, ih = image.size
? ? ? ? # 保存框的位置
? ? ? ? box = np.array([np.array(list(map(int,box.split(',')))) for box in line_content[1:]])
? ? ? ??
? ? ? ? # image.save(str(index)+".jpg")
? ? ? ? # 是否翻轉圖片
? ? ? ? flip = rand()<.5
? ? ? ? if flip and len(box)>0:
? ? ? ? ? ? image = image.transpose(Image.FLIP_LEFT_RIGHT)
? ? ? ? ? ? box[:, [0,2]] = iw - box[:, [2,0]]

? ? ? ? # 對輸入進來的圖片進行縮放
? ? ? ? new_ar = w/h
? ? ? ? scale = rand(scale_low, scale_high)
? ? ? ? if new_ar < 1:
? ? ? ? ? ? nh = int(scale*h)
? ? ? ? ? ? nw = int(nh*new_ar)
? ? ? ? else:
? ? ? ? ? ? nw = int(scale*w)
? ? ? ? ? ? nh = int(nw/new_ar)
? ? ? ? image = image.resize((nw,nh), Image.BICUBIC)

? ? ? ? # 進行色域變換
? ? ? ? hue = rand(-hue, hue)
? ? ? ? sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
? ? ? ? val = rand(1, val) if rand()<.5 else 1/rand(1, val)
? ? ? ? x = rgb_to_hsv(np.array(image)/255.)
? ? ? ? x[..., 0] += hue
? ? ? ? x[..., 0][x[..., 0]>1] -= 1
? ? ? ? x[..., 0][x[..., 0]<0] += 1
? ? ? ? x[..., 1] *= sat
? ? ? ? x[..., 2] *= val
? ? ? ? x[x>1] = 1
? ? ? ? x[x<0] = 0
? ? ? ? image = hsv_to_rgb(x)

? ? ? ? image = Image.fromarray((image*255).astype(np.uint8))
? ? ? ? # 將圖片進行放置，分別對應四張分割圖片的位置
? ? ? ? dx = place_x[index]
? ? ? ? dy = place_y[index]
? ? ? ? new_image = Image.new('RGB', (w,h), (128,128,128))
? ? ? ? new_image.paste(image, (dx, dy))
? ? ? ? image_data = np.array(new_image)/255

? ? ? ? # Image.fromarray((image_data*255).astype(np.uint8)).save(str(index)+"distort.jpg")
? ? ? ??
? ? ? ? index = index + 1
? ? ? ? box_data = []
? ? ? ? # 對box進行重新處理
? ? ? ? if len(box)>0:
? ? ? ? ? ? np.random.shuffle(box)
? ? ? ? ? ? box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
? ? ? ? ? ? box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
? ? ? ? ? ? box[:, 0:2][box[:, 0:2]<0] = 0
? ? ? ? ? ? box[:, 2][box[:, 2]>w] = w
? ? ? ? ? ? box[:, 3][box[:, 3]>h] = h
? ? ? ? ? ? box_w = box[:, 2] - box[:, 0]
? ? ? ? ? ? box_h = box[:, 3] - box[:, 1]
? ? ? ? ? ? box = box[np.logical_and(box_w>1, box_h>1)]
? ? ? ? ? ? box_data = np.zeros((len(box),5))
? ? ? ? ? ? box_data[:len(box)] = box
? ? ? ??
? ? ? ? image_datas.append(image_data)
? ? ? ? box_datas.append(box_data)

? ? ? ? img = Image.fromarray((image_data*255).astype(np.uint8))
? ? ? ? for j in range(len(box_data)):
? ? ? ? ? ? thickness = 3
? ? ? ? ? ? left, top, right, bottom ?= box_data[j][0:4]
? ? ? ? ? ? draw = ImageDraw.Draw(img)
? ? ? ? ? ? for i in range(thickness):
? ? ? ? ? ? ? ? draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255))
? ? ? ? img.show()

? ??
? ? # 將圖片分割，放在一起
? ? cutx = np.random.randint(int(w*min_offset_x), int(w*(1 - min_offset_x)))
? ? cuty = np.random.randint(int(h*min_offset_y), int(h*(1 - min_offset_y)))

? ? new_image = np.zeros([h,w,3])
? ? new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]
? ? new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]
? ? new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]
? ? new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]

? ? # 對框進行進一步的處理
? ? new_boxes = merge_bboxes(box_datas, cutx, cuty)

? ? return new_image, new_boxes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
b)、Label Smoothing平滑
標簽平滑的思想很簡單，具體公式如下：

new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes
1
當label_smoothing的值為0.01得時候，公式變成如下所示：

new_onehot_labels = y * (1 - 0.01) + 0.01 / num_classes
1
其實Label Smoothing平滑就是將標簽進行一個平滑，原始的標簽是0、1，在平滑后變成0.005(如果是二分類)、0.995，也就是說對分類準確做了一點懲罰，讓模型不可以分類的太準確，太準確容易過擬合。

實現代碼如下：

#---------------------------------------------------#
# ? 平滑標簽
#---------------------------------------------------#
def _smooth_labels(y_true, label_smoothing):
? ? num_classes = K.shape(y_true)[-1],
? ? label_smoothing = K.constant(label_smoothing, dtype=K.floatx())
? ? return y_true * (1.0 - label_smoothing) + label_smoothing / num_classes
1
2
3
4
5
6
7
c)、CIOU
IoU是比值的概念，對目標物體的scale是不敏感的。然而常用的BBox的回歸損失優化和IoU優化不是完全等價的，尋常的IoU無法直接優化沒有重疊的部分。

于是有人提出直接使用IOU作為回歸優化loss，CIOU是其中非常優秀的一種想法。

CIOU將目標與anchor之間的距離，重疊率、尺度以及懲罰項都考慮進去，使得目標框回歸變得更加穩定，不會像IoU和GIoU一樣出現訓練過程中發散等問題。而懲罰因子把預測框長寬比擬合目標框的長寬比考慮進去。

CIOU公式如下
CIOU=IOU?ρ2(b,bgt)c2?αvCIOU = IOU - \frac{\rho^2(b,b^{gt})}{c^2} - \alpha v
CIOU=IOU??
c?
2
?
ρ?
2
?(b,b?
gt
?)
??? ?
??αv

其中，ρ2(b,bgt)\rho^2(b,b^{gt})ρ?
2
?(b,b?
gt
?)分別代表了預測框和真實框的中心點的歐式距離。 c代表的是能夠同時包含預測框和真實框的最小閉包區域的對角線距離。

而α\alphaα和vvv的公式如下
α=v1?IOU+v\alpha = \frac{v}{1-IOU+v}
α=?
1?IOU+v
v
??? ?
?

v=4π2(arctanwgthgt?arctanwh)2v = \frac{4}{\pi ^2}(arctan\frac{w^{gt}}{h^{gt}}-arctan\frac{w}{h})^2
v=?
π?
2
?
4
??? ?
?(arctan?
h?
gt
?
w?
gt
?
??? ?
??arctan?
h
w
??? ?
?)?
2
?

把1-CIOU就可以得到相應的LOSS了。
LOSSCIOU=1?IOU+ρ2(b,bgt)c2+αvLOSS_{CIOU} = 1 - IOU + \frac{\rho^2(b,b^{gt})}{c^2} + \alpha v
LOSS?
CIOU
??? ?
?=1?IOU+?
c?
2
?
ρ?
2
?(b,b?
gt
?)
??? ?
?+αv

def box_ciou(b1, b2):
? ? """
? ? 輸入為：
? ? ----------
? ? b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
? ? b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh

? ? 返回為：
? ? -------
? ? ciou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
? ? """
? ? # 求出預測框左上角右下角
? ? b1_xy = b1[..., :2]
? ? b1_wh = b1[..., 2:4]
? ? b1_wh_half = b1_wh/2.
? ? b1_mins = b1_xy - b1_wh_half
? ? b1_maxes = b1_xy + b1_wh_half
? ? # 求出真實框左上角右下角
? ? b2_xy = b2[..., :2]
? ? b2_wh = b2[..., 2:4]
? ? b2_wh_half = b2_wh/2.
? ? b2_mins = b2_xy - b2_wh_half
? ? b2_maxes = b2_xy + b2_wh_half

? ? # 求真實框和預測框所有的iou
? ? intersect_mins = K.maximum(b1_mins, b2_mins)
? ? intersect_maxes = K.minimum(b1_maxes, b2_maxes)
? ? intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
? ? intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
? ? b1_area = b1_wh[..., 0] * b1_wh[..., 1]
? ? b2_area = b2_wh[..., 0] * b2_wh[..., 1]
? ? union_area = b1_area + b2_area - intersect_area
? ? iou = intersect_area / (union_area + K.epsilon())

? ? # 計算中心的差距
? ? center_distance = K.sum(K.square(b1_xy - b2_xy), axis=-1)
? ? # 找到包裹兩個框的最小框的左上角和右下角
? ? enclose_mins = K.minimum(b1_mins, b2_mins)
? ? enclose_maxes = K.maximum(b1_maxes, b2_maxes)
? ? enclose_wh = K.maximum(enclose_maxes - enclose_mins, 0.0)
? ? # 計算對角線距離
? ? enclose_diagonal = K.sum(K.square(enclose_wh), axis=-1)
? ? # calculate ciou, add epsilon in denominator to avoid dividing by 0
? ? ciou = iou - 1.0 * (center_distance) / (enclose_diagonal + K.epsilon())

? ? # calculate param v and alpha to extend to CIoU
? ? v = 4*K.square(tf.math.atan2(b1_wh[..., 0], b1_wh[..., 1]) - tf.math.atan2(b2_wh[..., 0], b2_wh[..., 1])) / (math.pi * math.pi)
? ? alpha = v / (1.0 - iou + v)
? ? ciou = ciou - alpha * v

? ? ciou = K.expand_dims(ciou, -1)
? ? return ciou
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
d)、學習率余弦退火衰減
余弦退火衰減法，學習率會先上升再下降，這是退火優化法的思想。（關于什么是退火算法可以百度。）

上升的時候使用線性上升，下降的時候模擬cos函數下降。

效果如圖所示：

余弦退火衰減有幾個比較必要的參數：
1、learning_rate_base：學習率最高值。
2、warmup_learning_rate：最開始的學習率。
3、warmup_steps：多少步長后到達頂峰值。

實現方式如下，利用Callback實現，與普通的ReduceLROnPlateau調用方式類似：

import numpy as np
import matplotlib.pyplot as plt
import keras
from keras import backend as K
from keras.layers import Flatten,Conv2D,Dropout,Input,Dense,MaxPooling2D
from keras.models import Model

def cosine_decay_with_warmup(global_step,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?learning_rate_base,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?total_steps,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?warmup_learning_rate=0.0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?warmup_steps=0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?hold_base_rate_steps=0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?min_learn_rate=0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?):
? ? """
? ? 參數：
? ? ? ? ? ? global_step: 上面定義的Tcur，記錄當前執行的步數。
? ? ? ? ? ? learning_rate_base：預先設置的學習率，當warm_up階段學習率增加到learning_rate_base，就開始學習率下降。
? ? ? ? ? ? total_steps: 是總的訓練的步數，等于epoch*sample_count/batch_size,(sample_count是樣本總數，epoch是總的循環次數)
? ? ? ? ? ? warmup_learning_rate: 這是warm up階段線性增長的初始值
? ? ? ? ? ? warmup_steps: warm_up總的需要持續的步數
? ? ? ? ? ? hold_base_rate_steps: 這是可選的參數，即當warm up階段結束后保持學習率不變，知道hold_base_rate_steps結束后才開始學習率下降
? ? """
? ? if total_steps < warmup_steps:
? ? ? ? raise ValueError('total_steps must be larger or equal to '
? ? ? ? ? ? ? ? ? ? ? ? ? ? 'warmup_steps.')
? ? #這里實現了余弦退火的原理，設置學習率的最小值為0，所以簡化了表達式
? ? learning_rate = 0.5 * learning_rate_base * (1 + np.cos(np.pi *
? ? ? ? (global_step - warmup_steps - hold_base_rate_steps) / float(total_steps - warmup_steps - hold_base_rate_steps)))
? ? #如果hold_base_rate_steps大于0，表明在warm up結束后學習率在一定步數內保持不變
? ? if hold_base_rate_steps > 0:
? ? ? ? learning_rate = np.where(global_step > warmup_steps + hold_base_rate_steps,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? learning_rate, learning_rate_base)
? ? if warmup_steps > 0:
? ? ? ? if learning_rate_base < warmup_learning_rate:
? ? ? ? ? ? raise ValueError('learning_rate_base must be larger or equal to '
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 'warmup_learning_rate.')
? ? ? ? #線性增長的實現
? ? ? ? slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
? ? ? ? warmup_rate = slope * global_step + warmup_learning_rate
? ? ? ? #只有當global_step 仍然處于warm up階段才會使用線性增長的學習率warmup_rate，否則使用余弦退火的學習率learning_rate
? ? ? ? learning_rate = np.where(global_step < warmup_steps, warmup_rate,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? learning_rate)

? ? learning_rate = max(learning_rate,min_learn_rate)
? ? return learning_rate

class WarmUpCosineDecayScheduler(keras.callbacks.Callback):
? ? """
? ? 繼承Callback，實現對學習率的調度
? ? """
? ? def __init__(self,
? ? ? ? ? ? ? ? ?learning_rate_base,
? ? ? ? ? ? ? ? ?total_steps,
? ? ? ? ? ? ? ? ?global_step_init=0,
? ? ? ? ? ? ? ? ?warmup_learning_rate=0.0,
? ? ? ? ? ? ? ? ?warmup_steps=0,
? ? ? ? ? ? ? ? ?hold_base_rate_steps=0,
? ? ? ? ? ? ? ? ?min_learn_rate=0,
? ? ? ? ? ? ? ? ?verbose=0):
? ? ? ? super(WarmUpCosineDecayScheduler, self).__init__()
? ? ? ? # 基礎的學習率
? ? ? ? self.learning_rate_base = learning_rate_base
? ? ? ? # 總共的步數，訓練完所有世代的步數epochs * sample_count / batch_size
? ? ? ? self.total_steps = total_steps
? ? ? ? # 全局初始化step
? ? ? ? self.global_step = global_step_init
? ? ? ? # 熱調整參數
? ? ? ? self.warmup_learning_rate = warmup_learning_rate
? ? ? ? # 熱調整步長，warmup_epoch * sample_count / batch_size
? ? ? ? self.warmup_steps = warmup_steps
? ? ? ? self.hold_base_rate_steps = hold_base_rate_steps
? ? ? ? # 參數顯示 ?
? ? ? ? self.verbose = verbose
? ? ? ? # learning_rates用于記錄每次更新后的學習率，方便圖形化觀察
? ? ? ? self.min_learn_rate = min_learn_rate
? ? ? ? self.learning_rates = []
?? ?#更新global_step，并記錄當前學習率
? ? def on_batch_end(self, batch, logs=None):
? ? ? ? self.global_step = self.global_step + 1
? ? ? ? lr = K.get_value(self.model.optimizer.lr)
? ? ? ? self.learning_rates.append(lr)
?? ?#更新學習率
? ? def on_batch_begin(self, batch, logs=None):
? ? ? ? lr = cosine_decay_with_warmup(global_step=self.global_step,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? learning_rate_base=self.learning_rate_base,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? total_steps=self.total_steps,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? warmup_learning_rate=self.warmup_learning_rate,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? warmup_steps=self.warmup_steps,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hold_base_rate_steps=self.hold_base_rate_steps,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? min_learn_rate = self.min_learn_rate)
? ? ? ? K.set_value(self.model.optimizer.lr, lr)
? ? ? ? if self.verbose > 0:
? ? ? ? ? ? print('\nBatch %05d: setting learning '
? ? ? ? ? ? ? ? ? 'rate to %s.' % (self.global_step + 1, lr))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
2、loss組成
a)、計算loss所需參數
在計算loss的時候，實際上是y_pre和y_true之間的對比：
y_pre就是一幅圖像經過網絡之后的輸出，內部含有三個特征層的內容；其需要解碼才能夠在圖上作畫
y_true就是一個真實圖像中，它的每個真實框對應的(19,19)、(38,38)、(76,76)網格上的偏移位置、長寬與種類。其仍需要編碼才能與y_pred的結構一致
實際上y_pre和y_true內容的shape都是
(batch_size,19,19,3,85)
(batch_size,38,38,3,85)
(batch_size,76,76,3,85)

b)、y_pre是什么
網絡最后輸出的內容就是三個特征層每個網格點對應的預測框及其種類，即三個特征層分別對應著圖片被分為不同size的網格后，每個網格點上三個先驗框對應的位置、置信度及其種類。
對于輸出的y1、y2、y3而言，[…, : 2]指的是相對于每個網格點的偏移量，[…, 2: 4]指的是寬和高，[…, 4: 5]指的是該框的置信度，[…, 5: ]指的是每個種類的預測概率。
現在的y_pre還是沒有解碼的，解碼了之后才是真實圖像上的情況。

c)、y_true是什么。
y_true就是一個真實圖像中，它的每個真實框對應的(19,19)、(38,38)、(76,76)網格上的偏移位置、長寬與種類。其仍需要編碼才能與y_pred的結構一致
在yolo4中，其使用了一個專門的函數用于處理讀取進來的圖片的框的真實情況。

def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
1
其輸入為：
true_boxes：shape為(m, T, 5)代表m張圖T個框的x_min、y_min、x_max、y_max、class_id。
input_shape：輸入的形狀，此處為608、608
anchors：代表9個先驗框的大小
num_classes：種類的數量。
其實對真實框的處理是將真實框轉化成圖片中相對網格的xyhw，步驟如下：
1、取框的真實值，獲取其框的中心及其寬高，除去input_shape變成比例的模式。
2、建立全為0的y_true，y_true是一個列表，包含三個特征層，shape分別為(batch_size,19,19,3,85)、(batch_size,38,38,3,85)、(batch_size,76,76,3,85)。
3、對每一張圖片處理，將每一張圖片中的真實框的wh和先驗框的wh對比，計算IOU值，選取其中IOU最高的一個，得到其所屬特征層及其網格點的位置，在對應的y_true中將內容進行保存。

for t, n in enumerate(best_anchor):
? ? for l in range(num_layers):
? ? ? ? if n in anchor_mask[l]:

? ? ? ? ? ? # 計算該目標在第l個特征層所處網格的位置
? ? ? ? ? ? i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
? ? ? ? ? ? j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')

? ? ? ? ? ? # 找到best_anchor索引的索引
? ? ? ? ? ? k = anchor_mask[l].index(n)
? ? ? ? ? ? c = true_boxes[b,t, 4].astype('int32')
? ? ? ? ? ??
? ? ? ? ? ? # 保存到y_true中
? ? ? ? ? ? y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
? ? ? ? ? ? y_true[l][b, j, i, k, 4] = 1
? ? ? ? ? ? y_true[l][b, j, i, k, 5+c] = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
對于最后輸出的y_true而言，只有每個圖里每個框最對應的位置有數據，其它的地方都為0。
preprocess_true_boxes全部的代碼如下：

#---------------------------------------------------#
# ? 讀入xml文件，并輸出y_true
#---------------------------------------------------#
def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
? ? assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes'
? ? # 一共有三個特征層數
? ? num_layers = len(anchors)//3
? ? # 先驗框
? ? # 678為116,90, ?156,198, ?373,326
? ? # 345為30,61, ?62,45, ?59,119
? ? # 012為10,13, ?16,30, ?33,23, ?
? ? anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]

? ? true_boxes = np.array(true_boxes, dtype='float32')
? ? input_shape = np.array(input_shape, dtype='int32') # 416,416
? ? # 讀出xy軸，讀出長寬
? ? # 中心點(m,n,2)
? ? boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
? ? boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
? ? # 計算比例
? ? true_boxes[..., 0:2] = boxes_xy/input_shape[:]
? ? true_boxes[..., 2:4] = boxes_wh/input_shape[:]

? ? # m張圖
? ? m = true_boxes.shape[0]
? ? # 得到網格的shape為19,19;38,38;76,76
? ? grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)]
? ? # y_true的格式為(m,19,19,3,85)(m,38,38,3,85)(m,76,76,3,85)
? ? y_true = [np.zeros((m,grid_shapes[l][0],grid_shapes[l][1],len(anchor_mask[l]),5+num_classes),
? ? ? ? dtype='float32') for l in range(num_layers)]
? ? # [1,9,2]
? ? anchors = np.expand_dims(anchors, 0)
? ? anchor_maxes = anchors / 2.
? ? anchor_mins = -anchor_maxes
? ? # 長寬要大于0才有效
? ? valid_mask = boxes_wh[..., 0]>0

? ? for b in range(m):
? ? ? ? # 對每一張圖進行處理
? ? ? ? wh = boxes_wh[b, valid_mask[b]]
? ? ? ? if len(wh)==0: continue
? ? ? ? # [n,1,2]
? ? ? ? wh = np.expand_dims(wh, -2)
? ? ? ? box_maxes = wh / 2.
? ? ? ? box_mins = -box_maxes

? ? ? ? # 計算真實框和哪個先驗框最契合
? ? ? ? intersect_mins = np.maximum(box_mins, anchor_mins)
? ? ? ? intersect_maxes = np.minimum(box_maxes, anchor_maxes)
? ? ? ? intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
? ? ? ? intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
? ? ? ? box_area = wh[..., 0] * wh[..., 1]
? ? ? ? anchor_area = anchors[..., 0] * anchors[..., 1]
? ? ? ? iou = intersect_area / (box_area + anchor_area - intersect_area)
? ? ? ? # 維度是(n) 感謝消盡不死鳥的提醒
? ? ? ? best_anchor = np.argmax(iou, axis=-1)

? ? ? ? for t, n in enumerate(best_anchor):
? ? ? ? ? ? for l in range(num_layers):
? ? ? ? ? ? ? ? if n in anchor_mask[l]:
? ? ? ? ? ? ? ? ? ? # floor用于向下取整
? ? ? ? ? ? ? ? ? ? i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
? ? ? ? ? ? ? ? ? ? j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
? ? ? ? ? ? ? ? ? ? # 找到真實框在特征層l中第b副圖像對應的位置
? ? ? ? ? ? ? ? ? ? k = anchor_mask[l].index(n)
? ? ? ? ? ? ? ? ? ? c = true_boxes[b,t, 4].astype('int32')
? ? ? ? ? ? ? ? ? ? y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
? ? ? ? ? ? ? ? ? ? y_true[l][b, j, i, k, 4] = 1
? ? ? ? ? ? ? ? ? ? y_true[l][b, j, i, k, 5+c] = 1

? ? return y_true
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
d)、loss的計算過程
在得到了y_pre和y_true后怎么對比呢？不是簡單的減一下!

loss值需要對三個特征層進行處理，這里以最小的特征層為例。
1、利用y_true取出該特征層中真實存在目標的點的位置(m,19,19,3,1)及其對應的種類(m,19,19,3,80)。
2、將yolo_outputs的預測值輸出進行處理，得到reshape后的預測值y_pre，shape為(m,19,19,3,85)。還有解碼后的xy，wh。
3、對于每一幅圖，計算其中所有真實框與預測框的IOU，如果某些預測框和真實框的重合程度大于0.5，則忽略。
4、計算ciou作為回歸的loss，這里只計算正樣本的回歸loss。
5、計算置信度的loss，其有兩部分構成，第一部分是實際上存在目標的，預測結果中置信度的值與1對比；第二部分是實際上不存在目標的，在第四步中得到其最大IOU的值與0對比。
6、計算預測種類的loss，其計算的是實際上存在目標的，預測類與真實類的差距。

其實際上計算的總的loss是三個loss的和，這三個loss分別是：

實際存在的框，CIOU LOSS。
實際存在的框，預測結果中置信度的值與1對比；實際不存在的框，在上述步驟中，在第四步中得到其最大IOU的值與0對比。
實際存在的框，種類預測結果與實際結果的對比。
其實際代碼如下，使用yolo_loss就可以獲得loss值：

#---------------------------------------------------#
# ? 平滑標簽
#---------------------------------------------------#
def _smooth_labels(y_true, label_smoothing):
? ? num_classes = K.shape(y_true)[-1],
? ? label_smoothing = K.constant(label_smoothing, dtype=K.floatx())
? ? return y_true * (1.0 - label_smoothing) + label_smoothing / num_classes
#---------------------------------------------------#
# ? 將預測值的每個特征層調成真實值
#---------------------------------------------------#
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
? ? num_anchors = len(anchors)
? ? # [1, 1, 1, num_anchors, 2]
? ? anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])

? ? # 獲得x，y的網格
? ? # (19,19, 1, 2)
? ? grid_shape = K.shape(feats)[1:3] # height, width
? ? grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
? ? ? ? [1, grid_shape[1], 1, 1])
? ? grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
? ? ? ? [grid_shape[0], 1, 1, 1])
? ? grid = K.concatenate([grid_x, grid_y])
? ? grid = K.cast(grid, K.dtype(feats))

? ? # (batch_size,19,19,3,85)
? ? feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])

? ? # 在計算loss的時候返回如下參數
? ? if calc_loss == True:
? ? ? ? return grid, feats, box_xy, box_wh
? ? return box_xy, box_wh, box_confidence, box_class_probs

#---------------------------------------------------#
# ? 用于計算每個預測框與真實框的iou
#---------------------------------------------------#
def box_iou(b1, b2):
? ? # 19,19,3,1,4
? ? # 計算左上角的坐標和右下角的坐標
? ? b1 = K.expand_dims(b1, -2)
? ? b1_xy = b1[..., :2]
? ? b1_wh = b1[..., 2:4]
? ? b1_wh_half = b1_wh/2.
? ? b1_mins = b1_xy - b1_wh_half
? ? b1_maxes = b1_xy + b1_wh_half

? ? # 1,n,4
? ? # 計算左上角和右下角的坐標
? ? b2 = K.expand_dims(b2, 0)
? ? b2_xy = b2[..., :2]
? ? b2_wh = b2[..., 2:4]
? ? b2_wh_half = b2_wh/2.
? ? b2_mins = b2_xy - b2_wh_half
? ? b2_maxes = b2_xy + b2_wh_half

? ? # 計算重合面積
? ? intersect_mins = K.maximum(b1_mins, b2_mins)
? ? intersect_maxes = K.minimum(b1_maxes, b2_maxes)
? ? intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
? ? intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
? ? b1_area = b1_wh[..., 0] * b1_wh[..., 1]
? ? b2_area = b2_wh[..., 0] * b2_wh[..., 1]
? ? iou = intersect_area / (b1_area + b2_area - intersect_area)

? ? return iou

#---------------------------------------------------#
# ? loss值計算
#---------------------------------------------------#
def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, label_smoothing=0.1, print_loss=False):

? ? # 一共有三層
? ? num_layers = len(anchors)//3?

? ? # 將預測結果和實際ground truth分開，args是[*model_body.output, *y_true]
? ? # y_true是一個列表，包含三個特征層，shape分別為(m,19,19,3,85),(m,38,38,3,85),(m,76,76,3,85)。
? ? # yolo_outputs是一個列表，包含三個特征層，shape分別為(m,19,19,3,85),(m,38,38,3,85),(m,76,76,3,85)。
? ? y_true = args[num_layers:]
? ? yolo_outputs = args[:num_layers]

? ? # 先驗框
? ? # 678為116,90, ?156,198, ?373,326
? ? # 345為30,61, ?62,45, ?59,119
? ? # 012為10,13, ?16,30, ?33,23, ?
? ? anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]

? ? # 得到input_shpae為608,608?
? ? input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))

? ? loss = 0

? ? # 取出每一張圖片
? ? # m的值就是batch_size
? ? m = K.shape(yolo_outputs[0])[0]
? ? mf = K.cast(m, K.dtype(yolo_outputs[0]))

? ? # y_true是一個列表，包含三個特征層，shape分別為(m,19,19,3,85),(m,38,38,3,85),(m,76,76,3,85)。
? ? # yolo_outputs是一個列表，包含三個特征層，shape分別為(m,19,19,3,85),(m,38,38,3,85),(m,76,76,3,85)。
? ? for l in range(num_layers):
? ? ? ? # 以第一個特征層(m,19,19,3,85)為例子
? ? ? ? # 取出該特征層中存在目標的點的位置。(m,19,19,3,1)
? ? ? ? object_mask = y_true[l][..., 4:5]
? ? ? ? # 取出其對應的種類(m,19,19,3,80)
? ? ? ? true_class_probs = y_true[l][..., 5:]
? ? ? ? if label_smoothing:
? ? ? ? ? ? true_class_probs = _smooth_labels(true_class_probs, label_smoothing)

? ? ? ? # 將yolo_outputs的特征層輸出進行處理
? ? ? ? # grid為網格結構(19,19,1,2)，raw_pred為尚未處理的預測結果(m,19,19,3,85)
? ? ? ? # 還有解碼后的xy，wh，(m,19,19,3,2)
? ? ? ? grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
? ? ? ? ? ? ?anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
? ? ? ??
? ? ? ? # 這個是解碼后的預測的box的位置
? ? ? ? # (m,19,19,3,4)
? ? ? ? pred_box = K.concatenate([pred_xy, pred_wh])

? ? ? ? # 找到負樣本群組，第一步是創建一個數組，[]
? ? ? ? ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
? ? ? ? object_mask_bool = K.cast(object_mask, 'bool')
? ? ? ??
? ? ? ? # 對每一張圖片計算ignore_mask
? ? ? ? def loop_body(b, ignore_mask):
? ? ? ? ? ? # 取出第b副圖內，真實存在的所有的box的參數
? ? ? ? ? ? # n,4
? ? ? ? ? ? true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])
? ? ? ? ? ? # 計算預測結果與真實情況的iou
? ? ? ? ? ? # pred_box為19,19,3,4
? ? ? ? ? ? # 計算的結果是每個pred_box和其它所有真實框的iou
? ? ? ? ? ? # 19,19,3,n
? ? ? ? ? ? iou = box_iou(pred_box[b], true_box)

? ? ? ? ? ? # 19,19,3,1
? ? ? ? ? ? best_iou = K.max(iou, axis=-1)

? ? ? ? ? ? # 如果某些預測框和真實框的重合程度大于0.5，則忽略。
? ? ? ? ? ? ignore_mask = ignore_mask.write(b, K.cast(best_iou<ignore_thresh, K.dtype(true_box)))
? ? ? ? ? ? return b+1, ignore_mask

? ? ? ? # 遍歷所有的圖片
? ? ? ? _, ignore_mask = K.control_flow_ops.while_loop(lambda b,*args: b<m, loop_body, [0, ignore_mask])

? ? ? ? # 將每幅圖的內容壓縮，進行處理
? ? ? ? ignore_mask = ignore_mask.stack()
? ? ? ? #(m,19,19,3,1,1)
? ? ? ? ignore_mask = K.expand_dims(ignore_mask, -1)

? ? ? ? box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]

? ? ? ? # Calculate ciou loss as location loss
? ? ? ? raw_true_box = y_true[l][...,0:4]
? ? ? ? ciou = box_ciou(pred_box, raw_true_box)
? ? ? ? ciou_loss = object_mask * box_loss_scale * (1 - ciou)
? ? ? ? ciou_loss = K.sum(ciou_loss) / mf
? ? ? ? location_loss = ciou_loss
? ? ? ??
? ? ? ? # 如果該位置本來有框，那么計算1與置信度的交叉熵
? ? ? ? # 如果該位置本來沒有框，而且滿足best_iou<ignore_thresh，則被認定為負樣本
? ? ? ? # best_iou<ignore_thresh用于限制負樣本數量
? ? ? ? confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
? ? ? ? ? ? (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
? ? ? ??
? ? ? ? class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)

? ? ? ? confidence_loss = K.sum(confidence_loss) / mf
? ? ? ? class_loss = K.sum(class_loss) / mf
? ? ? ? loss += location_loss + confidence_loss + class_loss
? ? ? ? if print_loss:
? ? ? ? ? ? loss = tf.Print(loss, [loss, location_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')
? ? return loss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
訓練自己的YOLOV4模型
yolo4整體的文件夾構架如下：

本文使用VOC格式進行訓練。
訓練前將標簽文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。

訓練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。

在訓練前利用voc2yolo3.py文件生成對應的txt。

再運行根目錄下的voc_annotation.py，運行前需要將classes改成你自己的classes。

classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
1

就會生成對應的2007_train.txt，每一行對應其圖片位置及其真實框的位置。

在訓練前需要修改model_data里面的voc_classes.txt文件，需要將classes改成你自己的classes。

運行train.py即可開始訓練。

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的Keras搭建YoloV4目标检测平台的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：重磅更新！YoloV4最新论文与源码!权
下一篇： No module named ker

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

目标检测

Keras搭建YoloV4目标检测平台

總結