損失函數第一部分的寬高計算加根號,這是因為:一個同樣將一個100x100的目標與一個10x10的目標都預測大了10個像素,預測框為110 x 110與20 x 20。顯然第一種情況我們還可以失道接受,但第二種情況相當于把邊界框預測大了一倍,但如果不使用根號函數,那么損失相同,都為200,如果使用根號則可以表示出兩者的差異。
?使用的聚類原始數據是只有標注框的檢測數據集,YOLOv2、v3都會生成一個包含標注框位置和類別的TXT文件,其中每行都包含,即ground truth boxes相對于原圖的坐標,是框的中心點,是框的寬和高,N是所有標注框的個數; 首先給定k個聚類中心點,這里的是anchor boxes的寬和高尺寸,由于anchor boxes位置不固定,所以沒有(x,y)的坐標,只有寬和高; 計算每個標注框和每個聚類中心點的距離 d=1-IOU(標注框,聚類中心),計算時每個標注框的中心點都與聚類中心重合,這樣才能計算IOU值,即。將標注框分配給“距離”最近的聚類中心; 所有標注框分配完畢以后,對每個簇重新計算聚類中心點,計算方式為,是第i個簇的標注框個數,就是求該簇中所有標注框的寬和高的平均值。 重復第3、4步,直到聚類中心改變量很小。
作者對 k-means 算法取了各種k值,并且畫了一個曲線圖:
最終選擇了k=5,這是在模型復雜度和高召回率之間取了一個折中。
from os import listdir
from os.path import isfile, join
import argparse
#import cv2
import numpy as np
import sys
import os
import shutil
import random
import mathdef IOU(x,centroids):''':param x: 某一個ground truth的w,h:param centroids: anchor的w,h的集合[(w,h),(),...],共k個:return: 單個ground truth box與所有k個anchor box的IoU值集合'''IoUs = []w, h = x # ground truth的w,hfor centroid in centroids:c_w,c_h = centroid #anchor的w,hif c_w>=w and c_h>=h: #anchor包圍ground truthiou = w*h/(c_w*c_h)elif c_w>=w and c_h<=h: #anchor寬矮iou = w*c_h/(w*h + (c_w-w)*c_h)elif c_w<=w and c_h>=h: #anchor瘦長iou = c_w*h/(w*h + c_w*(c_h-h))else: #ground truth包圍anchor means both w,h are bigger than c_w and c_h respectivelyiou = (c_w*c_h)/(w*h)IoUs.append(iou) # will become (k,) shapereturn np.array(IoUs)def avg_IOU(X,centroids):''':param X: ground truth的w,h的集合[(w,h),(),...]:param centroids: anchor的w,h的集合[(w,h),(),...],共k個'''n,d = X.shapesum = 0.for i in range(X.shape[0]):sum+= max(IOU(X[i],centroids)) #返回一個ground truth與所有anchor的IoU中的最大值return sum/n #對所有ground truth求平均def write_anchors_to_file(centroids,X,anchor_file,input_shape,yolo_version):''':param centroids: anchor的w,h的集合[(w,h),(),...],共k個:param X: ground truth的w,h的集合[(w,h),(),...]:param anchor_file: anchor和平均IoU的輸出路徑'''f = open(anchor_file,'w')anchors = centroids.copy()print(anchors.shape)if yolo_version=='yolov2':for i in range(anchors.shape[0]):#yolo中對圖片的縮放倍數為32倍,所以這里除以32,# 如果網絡架構有改變,根據實際的縮放倍數來#求出anchor相對于縮放32倍以后的特征圖的實際大小(yolov2)anchors[i][0]*=input_shape/32.anchors[i][1]*=input_shape/32.elif yolo_version=='yolov3':for i in range(anchors.shape[0]):#求出yolov3相對于原圖的實際大小anchors[i][0]*=input_shapeanchors[i][1]*=input_shapeelse:print("the yolo version is not right!")exit(-1)widths = anchors[:,0]sorted_indices = np.argsort(widths)print('Anchors = ', anchors[sorted_indices])for i in sorted_indices[:-1]:f.write('%0.2f,%0.2f, '%(anchors[i,0],anchors[i,1]))#there should not be comma after last anchor, that's whyf.write('%0.2f,%0.2f\n'%(anchors[sorted_indices[-1:],0],anchors[sorted_indices[-1:],1]))f.write('%f\n'%(avg_IOU(X,centroids)))print()def kmeans(X,centroids,eps,anchor_file,input_shape,yolo_version):N = X.shape[0] #ground truth的個數iterations = 0print("centroids.shape",centroids)k,dim = centroids.shape #anchor的個數k以及w,h兩維,dim默認等于2prev_assignments = np.ones(N)*(-1) #對每個ground truth分配初始標簽iter = 0old_D = np.zeros((N,k)) #初始化每個ground truth對每個anchor的IoUwhile True:D = []iter+=1 for i in range(N):d = 1 - IOU(X[i],centroids)D.append(d)D = np.array(D) # D.shape = (N,k) 得到每個ground truth對每個anchor的IoUprint("iter {}: dists = {}".format(iter,np.sum(np.abs(old_D-D)))) #計算每次迭代和前一次IoU的變化值#assign samples to centroids assignments = np.argmin(D,axis=1) #將每個ground truth分配給距離d最小的anchor序號if (assignments == prev_assignments).all() : #如果前一次分配的結果和這次的結果相同,就輸出anchor以及平均IoUprint("Centroids = ",centroids)write_anchors_to_file(centroids,X,anchor_file,input_shape,yolo_version)return#calculate new centroidscentroid_sums=np.zeros((k,dim),np.float) #初始化以便對每個簇的w,h求和for i in range(N):centroid_sums[assignments[i]]+=X[i] #將每個簇中的ground truth的w和h分別累加for j in range(k): #對簇中的w,h求平均centroids[j] = centroid_sums[j]/(np.sum(assignments==j)+1)prev_assignments = assignments.copy() old_D = D.copy() def main(argv):parser = argparse.ArgumentParser()parser.add_argument('-filelist', default = r'E:\BaiduNetdiskDownload\darknetHG8245\scripts\train.txt',help='path to filelist\n' )parser.add_argument('-output_dir', default = r'E:\BaiduNetdiskDownload\darknetHG8245', type = str,help='Output anchor directory\n' )parser.add_argument('-num_clusters', default = 0, type = int, help='number of clusters\n' )'''需要注意的是yolov2輸出的值比較小是相對特征圖來說的,yolov3輸出值較大是相對原圖來說的,所以yolov2和yolov3的輸出是有區別的'''parser.add_argument('-yolo_version', default='yolov2', type=str,help='yolov2 or yolov3\n')parser.add_argument('-yolo_input_shape', default=416, type=int,help='input images shape,multiples of 32. etc. 416*416\n')args = parser.parse_args()if not os.path.exists(args.output_dir):os.mkdir(args.output_dir)f = open(args.filelist)lines = [line.rstrip('\n') for line in f.readlines()]annotation_dims = []for line in lines:line = line.replace('JPEGImages','labels')line = line.replace('.jpg','.txt')line = line.replace('.png','.txt')print(line)f2 = open(line)for line in f2.readlines():line = line.rstrip('\n')w,h = line.split(' ')[3:] #print(w,h)annotation_dims.append((float(w),float(h)))annotation_dims = np.array(annotation_dims) #保存所有ground truth框的(w,h)eps = 0.005if args.num_clusters == 0:for num_clusters in range(1,11): #we make 1 through 10 clusters anchor_file = join( args.output_dir,'anchors%d.txt'%(num_clusters))indices = [ random.randrange(annotation_dims.shape[0]) for i in range(num_clusters)]centroids = annotation_dims[indices]kmeans(annotation_dims,centroids,eps,anchor_file,args.yolo_input_shape,args.yolo_version)print('centroids.shape', centroids.shape)else:anchor_file = join( args.output_dir,'anchors%d.txt'%(args.num_clusters))indices = [ random.randrange(annotation_dims.shape[0]) for i in range(args.num_clusters)]centroids = annotation_dims[indices]kmeans(annotation_dims,centroids,eps,anchor_file,args.yolo_input_shape,args.yolo_version)print('centroids.shape', centroids.shape)if __name__=="__main__":main(sys.argv)
創新點簡述
關于BN作用,YOLOv2在加入BN層之后mAP上升2%
yolov1也在Image-Net預訓練模型上進行fine-tune,但是預訓練時網絡入口為224 x 224,而fine-tune時為448 x 448,這會帶來預訓練網絡與實際訓練網絡識別圖像尺寸的不兼容。yolov2直接使用448 x 448的網絡入口進行預訓練,然后在檢測任務上進行訓練,效果得到3.7%的提升。
yolov2為了提升小物體檢測效果,減少網絡中pooling層數目,使最終特征圖尺寸更大,如輸入為416 x 416,則輸出為13 x 13 x 125,其中13 x 13為最終特征圖,即原圖分格的個數,125為每個格子中的邊界框構成(5 x (classes + 5))。需要注意的是,特征圖尺寸取決于原圖尺寸,但特征圖尺寸必須為奇數,以此保存中間有一個位置能看到原圖中心處的目標。