當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

發(fā)布時間：2025/3/21 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了 CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CV：翻譯并解讀2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

導讀：人工智能領域，最新計算機視覺文章歷史綜述以及觀察，深度卷積神經(jīng)網(wǎng)絡的最新架構綜述。

原作者

Asifullah Khan1, 2*, Anabia Sohail1, 2, Umme Zahoora1, and Aqsa Saeed Qureshi1
1 Pattern Recognition Lab, DCIS, PIEAS, Nilore, Islamabad 45650, Pakistan
2 Deep Learning Lab, Center for Mathematical Sciences, PIEAS, Nilore, Islamabad 45650, Pakistan
asif@pieas.edu.pk

更新中……

相關文章
CV：翻譯并解讀2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV：翻譯并解讀2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章
CV：翻譯并解讀2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第五章~第八章

Abstract

1、Introduction

2 Basic CNN Components

2.1 Convolutional Layer

2.2 Pooling Layer

2.3 Activation Function

2.4 Batch Normalization

2.5 Dropout

2.6 Fully Connected Layer

3 Architectural Evolution of Deep CNN

3.1 Late 1980s-1999: Origin of CNN

3.2 Early 2000: Stagnation of CNN

3.3 2006-2011: Revival of CNN

3.4 2012-2014: Rise of CNN

3.5 2015-Present: Rapid increase in Architectural Innovations and Applications of CNN

原文下載：https://download.csdn.net/download/qq_41185868/15548439

Abstract

? ? ? ? Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art performance on various competitive benchmarks. The powerful learning ability of deep CNN is largely due to the use of multiple feature extraction stages (hidden layers) that can automatically learn representations from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs, and recently very interesting deep CNN architectures are reported. The recent race in developing deep CNNs shows that the innovative architectural ideas, as well as parameter optimization, can improve CNN performance. In this regard, different ideas in the CNN design have been explored such as the use of different activation and loss functions, parameter optimization, regularization, and restructuring of the processing units. However, the major improvement in representational capacity of the deep CNN is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is receiving substantial attention. This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting, and attention. Additionally, this survey also covers the elementary understanding of CNN components and sheds light on its current challenges and applications.

? ? ? ?深度卷積神經(jīng)網(wǎng)絡(CNNs)是一種特殊類型的神經(jīng)網(wǎng)絡，在各種競爭性基準測試中表現(xiàn)出了最先進的性能。深度CNN強大的學習能力很大程度上是由于它使用了多個特征提取階段(隱含層)，可以從數(shù)據(jù)中自動學習表示。大量數(shù)據(jù)的可用性和硬件處理單元的改進加速了CNNs的研究，并且，最近報道了非常有意思的深度CNN架構。最近開發(fā)深度CNNs的競賽表明，創(chuàng)新的架構思想和參數(shù)優(yōu)化可以提高CNN的性能。為此，在CNN的設計中探索了不同的思路，如使用不同的激活和丟失函數(shù)、參數(shù)優(yōu)化、正則化以及處理單元的重組。然而，深度CNN的代表性能力的主要提高是通過處理單元的重組實現(xiàn)的。特別是，使用一個塊作為一個結構單元而不是一層的想法正在得到大量的關注。因此，本次調查的重點是最近報道的深度CNN架構的內(nèi)在分類，因此，將CNN架構的最新創(chuàng)新分為七個不同的類別。這七個類別分別基于空間開發(fā)、深度、多路徑、寬度、特征地圖開發(fā)、通道提升和注意力機制。此外，本調查還涵蓋了對CNN組件的基本理解，并闡明了其當前的挑戰(zhàn)和應用。

Keywords: Deep Learning, Convolutional Neural Networks, Architecture, Representational Capacity, Residual Learning, and Channel Boosted CNN.

關鍵詞：深度學習，卷積神經(jīng)網(wǎng)絡，架構，表征能力，殘差學習，通道提升的CNN

1、Introduction

? ? ? ??Machine Learning (ML) algorithms belong to a specialized area in Artificial Intelligence (AI), which endows intelligence to computers by learning the underlying relationships among the data and making decisions without being explicitly programmed. Different ML algorithms have been developed since the late 1990s, for the emulation of human sensory responses such as speech and vision, but they have generally failed to achieve human-level satisfaction [1]–[6]. The challenging nature of Machine Vision (MV) tasks gives rise to a specialized class of Neural Networks (NN), known as Convolutional Neural Network (CNN) [7].	? ?機器學習(ML)算法屬于人工智能(AI)的一個專門領域，它通過學習數(shù)據(jù)之間的基本關系并在沒有顯示編程的情況下做出決策，從而賦予計算機智能。自20世紀90年代末以來，針對語音、視覺等人類感官反應的仿真，人們開發(fā)了各種各樣的ML算法，但普遍未能達到人的滿意程度[1]-[6]。由于機器視覺(MV)任務的挑戰(zhàn)性，產(chǎn)生了一類專門的神經(jīng)網(wǎng)絡(NN)，稱為卷積神經(jīng)網(wǎng)絡(CNN)[7]。
? ? ?CNNs are considered as one of the best techniques for learning image content and have shown state-of-the-art results on image recognition, segmentation, detection, and retrieval related tasks [8], [9]. The success of CNN has captured attention beyond academia. In industry, companies such as Google, Microsoft, AT&T, NEC, and Facebook have developed active research groups for exploring new architectures of CNN [10]. At present, most of the frontrunners of image processing competitions are employing deep CNN based models.	CNNs被認為是學習圖像內(nèi)容的最佳技術之一，在圖像識別、分割、檢測和檢索相關任務[8]、[9]方面已經(jīng)取得了最新的成果。CNN的成功吸引了學術界以外的關注。在業(yè)界，谷歌、微軟、AT&T、NEC、Facebook等公司都建立了活躍的研究小組，探索CNN[10]的新架構。目前，大多數(shù)圖像處理競賽的領跑者，都在使用基于深度CNN的模型。
The topology of CNN is divided into multiple learning stages composed of a combination of the convolutional layer, non-linear processing units, and subsampling layers [11]. Each layer performs multiple transformations using a bank of convolutional kernels (filters) [12]. Convolution operation extracts locally correlated features by dividing the image into small slices (similar to the retina of the human eye), making it capable of learning suitable features. Output of the convolutional kernels is assigned to non-linear processing units, which not only helps in learning abstraction but also embeds non-linearity in the feature space. This non-linearity generates different patterns of activations for different responses and thus facilitates in learning of semantic differences in images. Output of the non-linear function is usually followed by subsampling, which helps in summarizing the results and also makes the input invariant to geometrical distortions [12], [13].	CNN的拓撲結構分為多個學習階段，包括卷積層、非線性處理單元和子采樣層的組合[11]。每一層使用一組卷積核（濾波器）執(zhí)行多重變換[12]。卷積操作通過將圖像分割成小塊（類似于人眼視網(wǎng)膜）來提取局部相關特征，使其能夠學習合適的特征。卷積核的輸出被分配給非線性處理單元，這不僅有助于學習抽象，而且在特征空間中嵌入非線性。這種非線性會為不同的反應產(chǎn)生不同的激活模式，從而有助于學習圖像中的語義差異。非線性函數(shù)的輸出通常隨后是子采樣，這有助于總結結果，并使輸入對幾何畸變保持不變[12]，[13]。
The architectural design of CNN was inspired by Hubel and Wiesel’s work and thus largely follows the basic structure of primate’s visual cortex [14], [15]. CNN first came to limelight through the work of LeCuN in 1989 for the processing of grid-like topological data (images and?time series data) [7], [16]. The popularity of CNN is largely due to its hierarchical feature extraction ability. Hierarchical organization of CNN emulates the deep and layered learning process of the Neocortex in the human brain, which automatically extract features from the underlying data [17]. The staging of learning process in CNN shows quite resemblance with primate’s ventral pathway of visual cortex (V1-V2-V4-IT/VTC) [18]. The visual cortex of primates first receives input from the retinotopic area, where multi-scale highpass filtering and contrast normalization is performed by the lateral geniculate nucleus. After this, detection is performed by different regions of the visual cortex categorized as V1, V2, V3, and V4. In fact, V1 and V2 portion of visual cortex are similar to convolutional, and subsampling layers, whereas inferior temporal region resembles the higher layers of CNN, which makes inference about the image [19]. During training, CNN learns through backpropagation algorithm, by regulating the change in weights with respect to the input. Minimization of a cost function by CNN using backpropagation algorithm is similar to the response based learning of human brain. CNN has the ability to extract low, mid, and high-level features. High level features (more abstract features) are a combination of lower and mid-level features. With the automatic feature extraction ability, CNN reduces the need for synthesizing a separate feature extractor [20]. Thus, CNN can learn good internal representation from raw pixels with diminutive processing.	CNN的架構設計靈感來自于Hubel和Wiesel的工作，因此很大程度上遵循了靈長類動物視覺皮層的基本結構[14]，[15]。CNN最早是在1989年通過LeCuN的工作引起了人們的注意，它處理了網(wǎng)格狀的拓撲數(shù)據(jù)（圖像和時間序列數(shù)據(jù)）[7]，[16]。CNN的流行很大程度上是由于它的層次特征提取能力。CNN的分層組織模擬人腦皮層的深層和分層學習過程，它自動從底層數(shù)據(jù)中提取特征[17]。CNN中學習過程的分期與靈長類視覺皮層腹側通路（V1-V2-V4-IT/VTC）非常相似[18]。靈長類動物的視覺皮層首先接收來自視黃醇區(qū)的輸入，在視黃醇區(qū)，外側膝狀體核進行多尺度高通濾波和對比度歸一化。之后，由視覺皮層的不同區(qū)域進行檢測，這些區(qū)域分為V1、V2、V3和V4。事實上，視覺皮層的V1和V2部分與卷積層和亞采樣層相似，而顳下區(qū)與CNN的高層相似，后者對圖像進行推斷[19]。在訓練過程中，CNN通過反向傳播算法學習，通過調節(jié)輸入權重的變化。使用反向傳播算法的CNN最小化代價函數(shù)類似于基于響應的人腦學習。CNN能夠提取低、中、高級特征。高級特征（更抽象的特征）是低級和中級特征的組合。具有自動特征提取功能，CNN減少了合成單獨特征提取器的需要[20]。因此，CNN可以通過較小的處理從原始像素中學習良好的內(nèi)部表示。
The main boom in the use of CNN for image classification and segmentation occurred after it was observed that the representational capacity of a CNN can be enhanced by increasing its depth [21]. Deep architectures have an advantage over shallow architectures, when dealing with complex learning problems. Stacking of multiple linear and non-linear processing units in a layer wise fashion provides deep networks the ability to learn complex representations at different levels of abstraction. In addition, advancements in hardware and thus the availability of high computing resources is also one of the main reasons of the recent success of deep CNNs. Deep CNN architectures have shown significant performance of improvements over shallow and conventional vision based models. Apart from its use in supervised learning, deep CNNs have potential to learn useful representation from large scale of unlabeled data. Use of the multiple mapping functions by CNN enables it to improve the extraction of invariant representations and consequently, makes it capable to handle recognition tasks of hundreds of categories. Recently, it is shown that different level of features including both low and high-level can be transferred to a?generic recognition task by exploiting the concept of Transfer Learning (TL) [22]–[24]. Important attributes of CNN are hierarchical learning, automatic feature extraction, multi-tasking, and weight sharing [25]–[27].	CNN用于圖像分類和分割的主要興起發(fā)生在觀察到CNN的表示能力可以通過增加其深度來增強之后[21]。在處理復雜的學習問題時，深度架構比淺層架構具有優(yōu)勢。以分層方式堆疊多個線性和非線性處理單元，使深層網(wǎng)絡能夠在不同抽象級別學習復雜表示。此外，硬件的進步以及高計算資源的可用性也是deep CNNs最近成功的主要原因之一。深度CNN架構已經(jīng)顯示出比淺層和傳統(tǒng)的基于視覺的模型有顯著改進的性能。除了在監(jiān)督學習中的應用外，深度CNN還具有從大規(guī)模未標記數(shù)據(jù)中學習有用表示的潛力。利用CNN的多重映射函數(shù)，提高了不變量表示的提取效率，使其能夠處理數(shù)百個類別的識別任務。近年來，研究表明，利用遷移學習（TL）[22]-[24]的概念，可以將包括低層和高層特征在內(nèi)的不同層次的特征，轉化為一般的識別任務。CNN的重要特性是分層學習、自動特征提取、多任務處理和權重共享[25]-[27]。

? ? ? ??Various improvements in CNN learning strategy and architecture were performed to make CNN scalable to large and complex problems. These innovations can be categorized as parameter optimization, regularization, structural reformulation, etc. However, it is observed that CNN based applications became prevalent after the exemplary performance of AlexNet on ImageNet dataset [21]. Thus major innovations in CNN have been proposed since 2012 and were mainly due to restructuring of processing units and designing of new blocks. Similarly, Zeiler and Fergus [28] introduced the concept of layer-wise visualization of features, which shifted the trend towards extraction of features at low spatial resolution in deep architecture such as VGG [29]. Nowadays, most of the new architectures are built upon the principle of simple and homogenous topology introduced by VGG. On the other hand, Google group introduced an interesting idea of split, transform, and merge, and the corresponding block is known as inception block. The inception block for the very first time gave the concept of branching within a layer, which allows abstraction of features at different spatial scales [30]. In 2015, the concept of skip connections introduced by ResNet [31] for the training of deep CNNs got famous, and afterwards, this concept was used by most of the succeeding Nets, such as Inception-ResNet, WideResNet, ResNext, etc [32]–[34].	在CNN學習策略和體系結構方面進行了各種改進，使CNN能夠擴展到大型復雜問題。這些創(chuàng)新可分為參數(shù)優(yōu)化、正則化、結構重構等。然而，據(jù)觀察，在AlexNet在ImageNet數(shù)據(jù)集上的示范性能之后，基于CNN的應用變得普遍[21]。因此，自2012年以來，CNN提出了重大創(chuàng)新，主要歸功于處理單元的重組和新區(qū)塊的設計。類似地，Zeiler和Fergus[28]引入了特征分層可視化的概念，這改變了深度架構（如VGG[29]）中以低空間分辨率提取特征的趨勢。目前，大多數(shù)新的體系結構都是基于VGG提出的簡單、同質的拓撲結構原理。另一方面，Google group引入了一個有趣的拆分、轉換和合并的概念，相應的塊稱為inception塊。inception塊第一次給出了層內(nèi)分支的概念，允許在不同的空間尺度上抽象特征[30]。2015年，ResNet[31]提出的用于訓練深層CNNs的skip連接的概念很出名，之后，這個概念被大多數(shù)后續(xù)網(wǎng)絡使用，如Inception ResNet、WideResNet、ResNext等[32]-[34]。
? ? ? ? In order to improve the learning capacity of a CNN, different architectural designs such as WideResNet, Pyramidal Net, Xception etc. explored the effect of multilevel transformations in terms of an additional cardinality and increase in width [32], [34], [35]. Therefore, the focus of research shifted from parameter optimization and connections readjustment towards improved architectural design (layer structure) of the network. This shift resulted in many new architectural ideas such as channel boosting, spatial and channel wise exploitation and attention based information processing etc. [36]–[38].	為了提高CNN的學習能力，不同的結構設計，如WideResNet、金字塔網(wǎng)、exception等，從增加基數(shù)和增加寬度的角度探討了多級轉換的效果[32]、[34]、[35]。因此，研究的重點從網(wǎng)絡的參數(shù)優(yōu)化和連接調整轉向網(wǎng)絡的改進結構設計（層結構）。這種轉變產(chǎn)生了許多新的架構思想，如信道增強、空間和信道利用以及基于注意力的信息處理等[36]-[38]。
In the past few years, different interesting surveys are conducted on deep CNNs that elaborate the basic components of CNN and their alternatives. The survey reported by [39] has reviewed the famous architectures from 2012-2015 along with their components. Similarly, in the?literature, there are prominent surveys that discuss different algorithms of CNN and focus on applications of CNN [20], [26], [27], [40], [41]. Likewise, the survey presented in [42] discussed taxonomy of CNNs based on acceleration techniques. On the other hand, in this survey, we discuss the intrinsic taxonomy present in the recent and prominent CNN architectures. The various CNN architectures discussed in this survey can be broadly classified into seven main categories namely; spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting, and attention based CNNs. The rest of the paper is organized in the following order (shown in Fig. 1): Section 1 summarizes the underlying basics of CNN, its resemblance with primate’s visual cortex, as well as its contribution in MV. In this regard, Section 2 provides the overview on basic CNN components and Section 3 discusses the architectural evolution of deep CNNs. Whereas, Section 4, discusses the recent innovations in CNN architectures and categorizes CNNs into seven broad classes. Section 5 and 6 shed light on applications of CNNs and current challenges, whereas section 7 discusses future work and last section draws conclusion.	在過去的幾年里，對深度CNN進行了不同有趣的調查，闡述了CNN的基本組成部分及其替代方案。[39]報告的調查回顧了2012-2015年著名架構及其組成部分。類似地，在文獻中，有一些著名的調查討論了CNN的不同算法，并著重于CNN的應用[20]、[26]、[27]、[40]、[41]。同樣，在[42]中提出的調查討論了基于加速技術的CNNs分類。另一方面，在這項調查中，我們討論了在最近和著名的CNN架構中存在的內(nèi)在分類法。本次調查中討論的各種CNN架構大致可分為七大類，即：空間開發(fā)、深度、多徑、寬度、特征地圖開發(fā)、信道增強和基于注意力的CNN。論文的其余部分按以下順序組織（如圖1所示）：第1節(jié)總結了CNN的基本原理，它與靈長類視覺皮層的相似性，以及它在MV中的貢獻。在這方面，第2節(jié)概述了基本CNN組件，第3節(jié)討論了deep CNNs的體系結構演變。第4節(jié)討論了CNN體系結構的最新創(chuàng)新，并將CNN分為七大類。第5節(jié)和第6節(jié)闡述了CNNs的應用和當前面臨的挑戰(zhàn)，第7節(jié)討論了未來的工作，最后一節(jié)得出結論。

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Fig. 1: Organization of the survey paper.

2 Basic CNN Components

? ? ? ??Nowadays, CNN is considered as the most widely used ML technique, especially in vision related applications. CNNs have recently shown state-of-the-art results in various ML applications. A typical block diagram of an ML system is shown in Fig. 2. Since, CNN possesses both good feature extraction and strong discrimination ability, therefore in a ML system; it is mostly used for feature extraction and classification.

? ?目前，CNN被認為是應用最廣泛的ML技術，尤其是在視覺相關應用中。CNNs最近在各種ML應用中顯示了最新的結果。ML系統(tǒng)的典型框圖如圖2所示。由于CNN具有良好的特征提取和較強的識別能力，因此在ML系統(tǒng)中，它主要用于特征提取和分類。

A typical CNN architecture generally comprises of alternate layers of convolution and pooling followed by one or more fully connected layers at the end. In some cases, fully connected layer is replaced with global average pooling layer. In addition to the various learning stages, different regulatory units such as batch normalization and dropout are also incorporated to optimize CNN performance [43]. The arrangement of CNN components play a fundamental role in designing new architectures and thus achieving enhanced performance. This section briefly discusses the role of these components in CNN architecture.

典型的CNN體系結構，通常包括交替的卷積層和池化，最后是一個或多個完全連接的層。在某些情況下，完全連接層被替換為全局平均池層。除了不同的學習階段，不同的常規(guī)單位，如 batch normalization和dropout，也被納入優(yōu)化CNN的表現(xiàn)[43]。CNN組件的排列在設計新的體系結構和提高性能方面起著基礎性的作用。本節(jié)簡要討論這些組件在CNN架構中的作用。

2.1 Convolutional Layer

? ? ? ??Convolutional layer is composed of a set of convolutional kernels (each neuron act as a kernel). These kernels are associated with a small area of the image known as a receptive field. It works by dividing the image into small blocks (receptive fields) and convolving them with a specific set of weights (multiplying elements of the filter with the corresponding receptive field elements) [43]. Convolution operation can expressed as follows:

? ? ? ?卷積層由一組卷積核組成（每個神經(jīng)元充當一個核）。這些核與被稱為感受野的圖像的一小部分相關。它的工作原理是將圖像分割成小的塊（接收場），并用一組特定的權重（將濾波器的元素與相應的接收場元素相乘）[43]。卷積運算可以表示為：

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

? ? Where, the input image is represented by x, y I , , xy shows spatial locality and k
l K represents the lth convolutional kernel of the kth layer. Division of image into small blocks helps in extracting locally correlated pixel values. This locally aggregated information is also known as feature motif. Different set of features within image are extracted by sliding convolutional kernel on the whole image with the same set of weights. This weight sharing feature of convolution operation?makes CNN parameter efficient as compared to fully connected Nets. Convolution operation may further be categorized into different types based on the type and size of filters, type of padding, and the direction of convolution [44]. Additionally, if the kernel is symmetric, the convolution operation becomes a correlation operation [16].

其中，輸入圖像由x，y I，x y表示空間局部性，k l k表示第k層的第l卷積核。將圖像分割成小塊有助于提取局部相關像素值。這種局部聚集的信息也被稱為特征模體。在相同的權值集下，通過滑動卷積核提取圖像中不同的特征集。與全連通網(wǎng)絡相比，卷積運算的這種權值共享特性使得CNN參數(shù)更有效。卷積操作還可以基于濾波器的類型和大小、填充的類型和卷積的方向而被分為不同的類型[44]。另外，如果核是對稱的，卷積操作就變成相關性操作[16]。

2.2 Pooling Layer

? ? ? ??Feature motifs, which result as an output of convolution operation can occur at different locations in the image. Once features are extracted, its exact location becomes less important as long as its approximate position relative to others is preserved. Pooling or downsampling like convolution, is an interesting local operation. It sums up similar information in the neighborhood of the receptive field and outputs the dominant response within this local region [45].

? ? ? ?卷積運算輸出的特圖案可以出現(xiàn)在圖像的不同位置。一旦特征被提取，其精確位置就變得不那么重要了，只要其相對于其他位置的近似位置被保留。像卷積一樣的池化或下采樣是一種有趣的本地操作。它總結了接受野附近的相似信息，并輸出了該局部區(qū)域內(nèi)的主導反應[45]。

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Equation (2) shows the pooling operation in which l Z represents the lth output feature map, ,lxyF? shows the lth input feature map, whereas p f (.) defines the type of pooling operation. The use ofpooling operation helps to extract a combination of features, which are invariant to translational shifts and small distortions [13], [46]. Reduction in the size of feature map to invariant feature set not only regulates complexity of the network but also helps in increasing the generalization by reducing overfitting. Different types of pooling formulations such as max, average, L2, overlapping, spatial pyramid pooling, etc. are used in CNN [47]–[49].

? ? ? ? 等式（2）表示池操作，其中l(wèi) Z表示lth輸出特征映射，lxyF表示lth輸入特征映射，而p f（.）定義池操作的類型。使用pooling操作有助于提取特征的組合，這些特征對平移位移和小的失真是不變的[13]，[46]。將特征映射的大小減少到不變特征集不僅可以調節(jié)網(wǎng)絡的復雜度，而且有助于通過減少過擬合來增加泛化。CNN中使用了不同類型的池公式，如max、average、L2、overlapping、空間金字塔池化等[47]–[49]。

2.3 Activation Function

? ? ? ??Activation function serves as a decision function and helps in learning a complex pattern. Selection of an appropriate activation function can accelerate the learning process. Activation function for a convolved feature map is defined in equation (3).

? ? ? ?激活函數(shù)作為一個決策函數(shù)，有助于學習一個復雜的模式。選擇合適的激活函數(shù)可以加速學習過程。卷積特征映射的激活函數(shù)在方程（3）中定義。

In above equation, k l F is an output of a convolution operation, which is assigned to activation? function; A f (.) that adds non-linearity and returns a transformed output k? l T for kth layer. In? literature, different activation functions such as sigmoid, tanh, maxout, ReLU, and variants of? ReLU such as leaky ReLU, ELU, and PReLU [39], [48], [50], [51] are used to inculcate nonlinear? combination of features. However, ReLU and its variants are preferred over others? activations as it helps in overcoming the vanishing gradient problem [52], [53].

? ? ?在上面的等式中，k l F是卷積運算的輸出，該卷積運算被分配給激活函數(shù)；F（.）添加非線性并返回第k層的轉換輸出k l T。在文獻中，不同的激活函數(shù)如sigmoid、tanh、maxout、ReLU和ReLU的變體如leaky ReLU、ELU和PReLU[39]、[48]、[50]、[51]被用來灌輸特征的非線性組合。然而，ReLU及其變體比其他激活更受歡迎，因為它有助于克服消失梯度問題[52]，[53]。

? ? ? ? ?Fig. 2: Basic layout of a typical ML system. In ML related tasks, initially data is preprocessed and then assigned to a classification system. A typical ML problem follows three steps: stage 1 is related to data gathering and generation, stage 2 performs preprocessing and feature selection, whereas stage 3 is based on model selection, parameter tuning, and analysis. CNN has a good feature extraction and strong discrimination ability, therefore in a ML system; it can be used for feature extraction and classification.

? ?圖2：典型ML系統(tǒng)的基本布局。在與ML相關的任務中，首先對數(shù)據(jù)進行預處理，然后將其分配給分類系統(tǒng)。一個典型的ML問題有三個步驟：階段1與數(shù)據(jù)收集和生成相關，階段2執(zhí)行預處理和特征選擇，而階段3基于模型選擇、參數(shù)調整和分析。CNN具有很好的特征提取能力和較強的識別能力，因此在ML系統(tǒng)中可以用于特征提取和分類。

2.4 Batch Normalization

注：根據(jù)博主的經(jīng)驗，此處常為考點！

? ? ? ??Batch normalization is used to address the issues related to internal covariance shift within feature maps. The internal covariance shift is a change in the distribution of hidden units’ values, which slow down the convergence (by forcing learning rate to small value) and requires careful initialization of parameters. Batch normalization for a transformed feature map k lT is shown in equation (4).

? ? ? ?批處理規(guī)范化用于解決與特征映射內(nèi)部協(xié)方差偏移相關的問題。內(nèi)協(xié)方差偏移是隱藏單元值分布的一種變化，它會減慢收斂速度（通過強制學習速率為小值），并且需要謹慎的初始化參數(shù)。轉換后的特征映射k lT的批處理規(guī)范化如等式（4）所示。

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

? ? ? In equation (4), k l N represents normalized feature map, kl F is the input feature map, B? and 2 B ?? depict mean and variance of a feature map for a mini batch respectively. Batch normalization? unifies the distribution of feature map values by bringing them to zero mean and unit variance [54]. Furthermore, it smoothens the flow of gradient and acts as a regulating factor, which thus helps in improving generalization of the network.

? ? ?在式（4）中，k l N表示歸一化特征映射，kl F是輸入特征映射，Bμ和2 B?分別表示小批量特征映射的均值和方差。批量規(guī)范化通過使特征映射值的平均值和單位方差為零來統(tǒng)一分布[54]。此外，它平滑了梯度的流動，起到了調節(jié)因子的作用，從而有助于提高網(wǎng)絡的泛化能力。

2.5 Dropout

? ? ? ??Dropout introduces regularization within the network, which ultimately improves generalization by randomly skipping some units or connections with a certain probability. In NNs, multiple connections that learn a non-linear relation are sometimes co-adapted, which causes overfitting [55]. This random dropping of some connections or units produces several thinned network architectures, and finally one representative network is selected with small weights. This selected architecture is then considered as an approximation of all of the proposed networks [56].

? ? ? ?Dropout在網(wǎng)絡中引入正則化，通過隨機跳過某些具有一定概率的單元或連接，最終提高泛化能力。在NNs中，學習非線性關系的多個連接有時是協(xié)同適應的，這會導致過度擬合[55]。一些連接或單元的隨機丟棄產(chǎn)生了幾種細化的網(wǎng)絡結構，最后選擇了一種具有代表性的網(wǎng)絡結構。然后將所選擇的體系結構看作是所提出的所有網(wǎng)絡的近似〔56〕。

2.6 Fully Connected Layer

? ? ? ?Fully connected layer is mostly used at the end of the network for classification purpose. Unlike pooling and convolution, it is a global operation. It takes input from the previous layer and globally analyses output of all the preceding layers [57]. This makes a non-linear combination of selected features, which are used for the classification of data. [58].

? ? ? ?全連接層主要用于網(wǎng)絡末端的分類。與池化和卷積不同，它是一個全局操作。它接受前一層的輸入，并全局分析所有前一層的輸出[57]。這使得用于數(shù)據(jù)分類的選定特征的非線性組合。[58]。

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Fig. 3: Evolutionary history of deep CNNs

3 Architectural Evolution of Deep CNN

? ? ? ?Nowadays, CNNs are considered as the most widely used algorithms among biologically inspired AI techniques. CNN history begins from the neurobiological experiments conducted by Hubel and Wiesel (1959, 1962) [14], [59]. Their work provided a platform for many cognitive models, almost all of which were latterly replaced by CNN. Over the decades, different efforts have been carried out to improve the performance of CNNs. This history is pictorially represented in Fig. 3. These improvements can be categorized into five different eras and are discussed below.

? ? ? ?目前，CNNs被認為是生物人工智能技術中應用最廣泛的算法。CNN的歷史始于Hubel和Wiesel（19591962）[14]，[59]進行的神經(jīng)生物學實驗。他們的工作為許多認知模型提供了一個平臺，幾乎所有的認知模型都被CNN所取代。幾十年來，人們一直在努力提高CNNs的性能。這段歷史在圖3中用圖形表示這些改進可以分為五個不同的時代，并在下面討論。

3.1 Late 1980s-1999: Origin of CNN

? ? ? ?CNNs have been applied to visual tasks since the late 1980s. In 1989, LeCuN et al. proposed the first multilayered CNN named as ConvNet, whose origin rooted in Fukushima’s Neocognitron [60], [61]. LeCuN proposed supervised training of ConvNet, using Backpropagation algorithm [7], [62] in comparison to the unsupervised reinforcement learning scheme used by its predecessor Neocognitron. LeCuN’s work thus made a foundation for the modern 2D CNNs. Supervised training in CNN provides the automatic feature learning ability from raw input, rather than designing of handcrafted features, used by traditional ML methods. This ConvNet showed successful results for handwritten digit and zip code recognition related problems [63]. In 1998, ConvNet was improved by LeCuN and used for classifying characters in a document recognition application [64]. This modified architecture was named as LeNet-5, which was an improvement over the initial CNN as it can extract feature representation in a hierarchical way from raw pixels [65]. Reliance of LeNet-5 on fewer parameters along with consideration of spatial topology of images enabled CNN to recognize rotational variants of the image [65]. Due to the good performance of CNN in optical character recognition, its commercial use in ATM and Banks started in 1993 and 1996, respectively. Though, many successful milestones were achieved by LeNet-5, yet the main concern associated with it was that its discrimination power was not scaled to classification tasks other than hand recognition.

? ? ? ?自20世紀80年代末以來，CNNs已經(jīng)被應用于視覺任務中。提出了第一個叫做ConvNet的多層CNN，其起源于Fukushima’s 的Neocognitron[60]，[61]。LeCuN提出了ConvNet的有監(jiān)督訓練，使用了Backpropagation算法[7]，[62]，與其前身Neocognitron使用的無監(jiān)督強化學習方案相比。他的作品為現(xiàn)代2D CNN奠定了基礎。CNN中的監(jiān)督訓練提供了從原始輸入中自動學習特征的能力，而不是傳統(tǒng)ML方法所使用的手工特征的設計。這個ConvNet顯示了手寫數(shù)字和郵政編碼識別相關問題的成功結果[63]。1998年，LeCuN改進了ConvNet，并將其用于文檔識別應用程序中的字符分類[64]。這種改進的結構被命名為LeNet-5，這是對初始CNN的改進，因為它可以從原始像素中以分層的方式提取特征表示[65]。LeNet-5對較少參數(shù)的依賴以及對圖像空間拓撲的考慮使得CNN能夠識別圖像的旋轉變體[65]。由于CNN在光學字符識別方面的良好性能，其在ATM和銀行的商業(yè)應用分別始于1993年和1996年。盡管LeNet-5取得了許多成功的里程碑，但與之相關的主要問題是它的辨別能力并沒有擴展到除手識別以外的分類任務。

3.2 Early 2000: Stagnation of CNN

? ? ? ?In the late 1990s and early 2000s, interest in NNs reduced and less attention was given to explore the role of CNNs in different applications such as object detection, video surveillance, etc. Use of CNN in ML related tasks became dormant due to the insignificant improvement in performance at the cost of high computational time. At that time, other statistical methods and, in particular, SVM became more popular than CNN due to its relatively high performance [66]–[68]. It was widely presumed in early 2000 that the backpropagation algorithm used for training of CNN was not effective in converging to optimal points and therefore unable to learn useful features in supervised fashion as compared to handcrafted features [69]. Meanwhile, different researchers kept working on CNN and tried to optimize its performance. In 2003, Simard et al. improved CNN architecture and showed good results as compared to SVM on a hand digit benchmark dataset; MNIST [64], [68], [70]–[72]. This performance improvement expedited the research in CNN by extending its application in optical character recognition (OCR) to other script’s character recognition [72]–[74], deployment in image sensors for face detection in video conferencing, and regulation of street crimes, etc. Likewise, CNN based systems were industrialized in markets for tracking customers [75]–[77]. Moreover, CNN’s potential in other applications such as medical image segmentation, anomaly detection, and robot vision was also explored [78]–[80].

? ? ? ?在20世紀90年代末和21世紀初，人們對神經(jīng)網(wǎng)絡的興趣逐漸減少，對神經(jīng)網(wǎng)絡在目標檢測、視頻監(jiān)控等不同應用中的作用的研究也越來越少。由于性能上的顯著提高，在ML相關任務中使用神經(jīng)網(wǎng)絡以犧牲較高的計算時間而變得不活躍。當時，其他統(tǒng)計方法，特別是支持向量機，由于其相對較高的性能而變得比CNN更受歡迎[66]-[68]。2000年初，人們普遍認為，用于CNN訓練的反向傳播算法在收斂到最優(yōu)點方面并不有效，因此與手工制作的特征相比，無法以監(jiān)督方式學習有用的特征[69]。與此同時，不同的研究人員繼續(xù)研究CNN，并試圖優(yōu)化其性能。2003年，Simard等人。改進了CNN的體系結構，與支持向量機相比，在一個手寫數(shù)字基準數(shù)據(jù)集上顯示了良好的結果；MNIST[64]，[68]，[70]–[72]。這種性能的提高加速了CNN的研究，將其在光學字符識別（OCR）中的應用擴展到其他腳本的字符識別[72]-[74]，在視頻會議中部署用于面部檢測的圖像傳感器，以及對街頭犯罪的監(jiān)管等。同樣，基于CNN的系統(tǒng)也在市場上實現(xiàn)了工業(yè)化用于跟蹤客戶[75]–[77]。此外，CNN在醫(yī)學圖像分割、異常檢測和機器人視覺等其他應用領域的潛力也得到了探索[78]-[80]。

3.3 2006-2011: Revival of CNN

? ? ? ?Deep NNs have generally complex architecture and time intensive training phase that sometimes spanned over weeks and even months. In early 2000, there were only a few techniques for the training of deep Networks. Additionally, it was considered that CNN is not able to scale for complex problems. These challenges halted the use of CNN in ML related tasks.	? ? ? ?深度NNs通常具有復雜的結構和時間密集型訓練階段，有時跨越數(shù)周甚至數(shù)月。在2000年初，只有少數(shù)技術用于訓練深層網(wǎng)絡。此外，有人認為CNN無法擴展到復雜的問題。這些挑戰(zhàn)阻止了CNN在ML相關任務中的應用。 ? ? ? ? ? ??
? ? ? To address these problems, in 2006 many interesting methods were reported to overcome the difficulties encountered in the training of deep CNNs and learning of invariant features. Hinton proposed greedy layer-wise pre-training approach in 2006, for deep architectures, which revived and reinstated the importance of deep learning [81], [82]. The revival of a deep learning [83], [84] was one of the factors, which brought deep CNNs into the limelight. Huang et al. (2006) used max pooling instead of subsampling, which showed good results by learning of invariant features [46], [85].? ? ? ? ? ? ? ? ? ?	為了解決這些問題，2006年報道了許多有趣的方法來克服在訓練深層CNNs和學習不變特征方面遇到的困難。Hinton在2006年提出了貪婪的分層預訓練方法，用于深層架構，這恢復了深層學習的重要性[81]，[82]。深度學習的復興[83]，[84]是其中的一個因素，這使深度cnn成為了焦點。Huang等人。（2006）使用最大值池代替子采樣，通過學習不變特征顯示了良好的結果[46]，[85]
? ? ? ? In late 2006, researchers started using graphics processing units (GPUs) [86], [87] to accelerate training of deep NN and CNN architectures [88], [89]. In 2007, NVIDIA launched the CUDA programming platform [90], [91], which allows exploitation of parallel processing capabilities of GPU with a much greater degree [92]. In essence, the use of GPUs for NN training [88], [93] and other hardware improvements were the main factor, which revived the research in CNN. In 2010, Fei-Fei Li’s group at Stanford, established a large database of images known as ImageNet, containing millions of labeled images [94]. This database was coupled with the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competitions, where the performances of various models have been evaluated and scored [95]. Consequently, ILSVRC and NIPS have been very active in strengthening research and increasing the use of CNN and thus making it popular. This was a turning point in improving the performance and increasing the use of CNN.	2006年末，研究人員開始使用圖形處理單元（GPU）[86]，[87]來加速深度神經(jīng)網(wǎng)絡和CNN架構的訓練[88]，[89]。2007年，NVIDIA推出了CUDA編程平臺[90]，[91]，它允許在更大程度上利用GPU的并行處理能力[92]。從本質上講，GPUs在神經(jīng)網(wǎng)絡訓練中的應用[88]、[93]和其他硬件的改進是主要因素，這使CNN的研究重新活躍起來。2010年，李飛飛在斯坦福大學的團隊建立了一個名為ImageNet的大型圖像數(shù)據(jù)庫，其中包含數(shù)百萬個標記圖像[94]。該數(shù)據(jù)庫與年度ImageNet大型視覺識別挑戰(zhàn)賽（ILSVRC）相結合，對各種模型的性能進行了評估和評分[95]。因此，ILSVRC和NIPS在加強研究和增加CNN的使用方面非常積極，從而使其流行起來。這是一個轉折點，在提高性能和增加使用有線電視新聞網(wǎng)。

3.4 2012-2014: Rise of CNN

? ? ? ?Availability of big training data, hardware advancements, and computational resources contributed to advancement in CNN algorithms. Renaissance of CNN in object detection, image classification, and segmentation related tasks had been observed in this period [9], [96]. However, the success of CNN in image classification tasks was not only due to the result of aforementioned factors but largely contributed by the architectural modifications, parameter optimization, incorporation of regulatory units, and reformulation and readjustment of connections within the network [39], [42], [97].	? ? ? ?大訓練數(shù)據(jù)的可用性、硬件的先進性和計算資源有助于CNN算法的進步。CNN在目標檢測、圖像分類和與分割相關的任務方面的復興在這一時期已經(jīng)被觀察到了[9]，[96]。然而，CNN在圖像分類任務中的成功不僅是由于上述因素的結果，而且在很大程度上是由于結構的修改、參數(shù)的優(yōu)化、調節(jié)單元的合并以及網(wǎng)絡內(nèi)連接的重新制定和調整[39]、[42]、[97]。γi
? ? ? ?The main breakthrough in CNN performance was brought by AlexNet [21]. AlexNet won the 2012-ILSVRC competition, which has been one of the most difficult challenges in image detection and classification. AlexNet improved performance by exploiting depth (incorporating multiple levels of transformation) and introduced regularization term in CNN. The exemplary performance of AlexNet [21] compared to conventional ML techniques in 2012-ILSVRC (AlexNet reduced error rate from 25.8 to 16.4) suggested that the main reason of the saturation in CNN performance before 2006 was largely due to the unavailability of enough training data and?computational resources. In summary, before 2006, these resource deficiencies made it hard to train a high-capacity CNN without deterioration of performance [98].? ? ? ? ? ? ?	CNN的主要突破是由AlexNet帶來的[21]。AlexNet贏得了2012-ILSVRC比賽，這是圖像檢測和分類領域最困難的挑戰(zhàn)之一。AlexNet利用深度（包含多個層次的轉換）提高了性能，并在CNN中引入了正則化項。與2012-ILSVRC（AlexNet將錯誤率從25.8降低到16.4）中的傳統(tǒng)ML技術相比，AlexNet的示例性性能[21]表明，2006年之前CNN性能飽和的主要原因是缺乏足夠的訓練數(shù)據(jù)和計算資源。總之，在2006年之前，這些資源不足使得在不降低性能的情況下難以訓練高容量CNN[98]
? ? ? With CNN becoming more of a commodity in the computer vision (CV) field, a number of attempts have been made to improve the performance of CNN with reduced computational cost. Therefore, each new architecture try to overcome the shortcomings of previously proposed architecture in combination with new structural reformulations. In year 2013 and 2014, researchers mainly focused on parameter optimization to accelerate CNN performance in a range of applications with a small increase in computational complexity. In 2013, Zeiler and Fergus [28] defined a mechanism to visualize learned filters of each CNN layer. Visualization approach was used to improve the feature extraction stage by reducing the size of the filters. Similarly, VGG architecture [29] proposed by the Oxford group, which was runner-up at the 2014-ILSVRC competition, made the receptive field much smaller in comparison to that of AlexNet but, with increased volume. In VGG, depth was increased from 9 layers to 16, by making the volume of features maps double at each layer. In the same year, GoogleNet [99] that won 2014-ILSVRC competition, not only exerted its efforts to reduce computational cost by changing layer design, but also widened the width in compliance with depth to improve CNN performance. GoogleNet introduced the concept of split, transform, and merge based blocks, within which multiscale and multilevel transformation is incorporated to capture both local and global information [33], [99], [100]. The use of multilevel transformations helps CNN in tackling details of images at various levels. In the year 2012-14, the main improvement in the learning capacity of CNN was achieved by increasing its depth and parameter optimization strategies. This suggested that the depth of a CNN helps in improving the performance of a classifier.	隨著CNN在計算機視覺（CV）領域的應用越來越廣泛，人們在降低計算成本的前提下，對CNN的性能進行了許多嘗試。因此，每一個新的架構都試圖結合新的結構重組來克服先前提出的建筑的缺點。在第2013和2014年，研究人員主要集中在參數(shù)優(yōu)化，以加速CNN在一系列應用中的性能，計算復雜性的增加很小。2013年，Zeiler和Fergus[28]定義了一種機制，可以可視化每個CNN層的學習過濾器。采用可視化的方法，通過減小濾波器的尺寸來改善特征提取階段。同樣，在2014-ILSVRC競賽中獲得亞軍的Oxford group提出的VGG架構[29]也使得接受場比AlexNet小得多，但隨著體積的增加。在VGG中，深度從9層增加到16層，使每層的特征地圖體積加倍。同年，贏得2014-ILSVRC競賽的GoogleNet[99]不僅努力通過改變層設計來降低計算成本，還根據(jù)深度拓寬了寬度以提高CNN性能。GoogleNet引入了基于分割、變換和合并的塊的概念，其中結合了多尺度和多級變換來捕獲局部和全局信息[33]、[99]、[100]。多級轉換的使用有助于CNN處理不同層次的圖像細節(jié)。2012-2014年，CNN的學習能力主要通過提高其深度和參數(shù)優(yōu)化策略來實現(xiàn)。這表明CNN的深度有助于提高分類器的性能。

3.5 2015-Present: Rapid increase in Architectural Innovations and Applications of CNN

? ? ? ?It is generally observed the major improvements in CNN performance occurred from 2015-2019. The research in CNN is still on going and has a significant potential of improvement. Representational capacity of CNN depends on its depth and in a sense can help in learning complex problems by defining diverse level of features ranging from simple to complex. Multiple levels of transformation make learning easy by chopping complex problems into?15 smaller modules. However, the main challenge faced by deep architectures is the problem of negative learning, which occurs due to diminishing gradient at lower layers of the network. To handle this problem, different research groups worked on readjustment of layers connections and design of new modules. In earlier 2015, Srivastava et al. used the concept of cross-channel connectivity and information gating mechanism to solve the vanishing gradient problem and to improve the network representational capacity [101]–[103]. This idea got famous in late 2015 and a similar concept of residual blocks or skip connections was coined [31]. Residual blocks are a variant of cross-channel connectivity, which smoothen learning by regularizing the flow of information across blocks [104]–[106]. This idea was used in ResNet architecture for the training of 150 layers deep network [31]. The idea of cross-channel connectivity is further extended to multilayer connectivity by Deluge, DenseNet, etc. to improve representation [107], [108].	? ? ? ?一般觀察到，CNN在2015-2019年的表現(xiàn)出現(xiàn)了重大改善。CNN的研究仍在進行中，有很大的改進潛力。CNN的表征能力取決于它的深度，在某種意義上可以通過定義從簡單到復雜的不同層次的特征來幫助學習復雜的問題。通過將復雜的問題分解成15個較小的模塊，多層次的轉換使學習變得容易。然而，深度架構面臨的主要挑戰(zhàn)是負學習問題，這是由于網(wǎng)絡較低層的梯度減小而產(chǎn)生的。為了解決這個問題，不同的研究小組致力于重新調整層連接和設計新的模塊。2015年初，Srivastava等人。利用跨通道連接和信息選通機制的概念解決了消失梯度問題，提高了網(wǎng)絡的表示能力[101]–[103]。這一想法在2015年末變得很有名，并創(chuàng)造了類似的剩余塊或跳過連接的概念[31]。剩余塊是跨信道連接的一種變體，它通過調整跨塊的信息流來平滑學習[104]–[106]。該思想被用于ResNet體系結構中，用于150層深度網(wǎng)絡的訓練[31]。為了改進表示[107]、[108]，通過Deluge、DenseNet等將跨信道連接的思想進一步擴展到多層連接。γi
? ? ? ? In the year 2016, the width of the network was also explored in connection with depth to improve feature learning [34], [35]. Apart from this, no new architectural modification became prominent but instead, different researchers used hybrid of the already proposed architectures to improve deep CNN performance [33], [104]–[106], [109], [110]. This fact gave the intuition that there might be other factors more important as compared to the appropriate assembly of the network units that can effectively regulate CNN performance. In this regard, Hu et al. (2017) identified that the network representation has a role in learning of deep CNNs [111]. Hu et al. introduced the idea of feature map exploitation and pinpointed that less informative and domain extraneous features may affect the performance of the network to a larger extent. He exploited the aforementioned idea and proposed new architecture named as Squeeze and Excitation Network (SE-Network) [111]. It exploits feature map (commonly known as channel in literature) information by designing a specialized SE-block. This block assigns weight to each feature map depending upon its contribution in class discrimination. This idea was further investigated by different researchers, which assign attention to important regions by exploiting both spatial and feature map (channel) information [37], [38], [112]. In 2018, a new idea of channel boosting was introduced by Khan et al [36]. The motivation behind the training of network with boosted channel representation was to use an enriched representation. This idea effectively boost the performance of a CNN by learning diverse features as well as exploiting the already learnt features through the concept of TL.	2016年，還結合深度探索了網(wǎng)絡的寬度，以改進特征學習[34]，[35]。除此之外，沒有新的架構修改變得突出，但相反，不同的研究人員使用已經(jīng)提出的架構的混合來改進深層CNN性能[33]、[104]–[106]、[109]、[110]。這一事實給人的直覺是，與能夠有效調節(jié)CNN性能的網(wǎng)絡單元的適當組裝相比，可能還有其他因素更重要。在這方面，胡等人。（2017）確定了網(wǎng)絡代表在學習深層CNN方面的作用[111]。Hu等人。介紹了特征圖的開發(fā)思想，指出信息量小、領域無關的特征對網(wǎng)絡性能的影響較大。他利用了上述思想，提出了一種新的結構，稱為擠壓激勵網(wǎng)絡（SE網(wǎng)絡）[111]。它通過設計一個專門的SE塊來開發(fā)特征映射（在文獻中通常稱為通道）信息。此塊根據(jù)其在類別識別中的貢獻為每個特征映射分配權重。不同的研究者對此進行了進一步的研究，他們利用空間和特征地圖（通道）信息將注意力分配到重要區(qū)域[37]、[38]、[112]。2018年，Khan等人[36]提出了一種新的渠道提升理念。提高渠道表征的網(wǎng)絡訓練背后的動機是使用豐富的表征。這一思想通過學習不同的特征以及通過TL的概念利用已經(jīng)學習的特征，有效地提高了CNN的性能
? ? ? ?From 2012 up till now, a lot of improvements have been reported in CNN architecture. As regards the architectural advancement of CNNs, recently the focus of research has been on designing of new blocks that can boost network representation by exploiting both feature maps and spatial information or by adding artificial channels.	從2012年到現(xiàn)在，CNN的架構有很多改進。關于CNNs的體系結構進展，近年來的研究重點是設計新的塊，通過利用特征圖和空間信息或添加人工通道來增強網(wǎng)絡表示。

總結

以上是生活随笔為你收集整理的CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Interview：算法岗位面试—11.
下一篇： Interview：算法岗位面试—11.

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章

Abstract

1、Introduction

2 Basic CNN Components

2.1 Convolutional Layer

2.2 Pooling Layer

2.3 Activation Function

2.4 Batch Normalization

2.5 Dropout

2.6 Fully Connected Layer

3 Architectural Evolution of Deep CNN

3.1 Late 1980s-1999: Origin of CNN

3.2 Early 2000: Stagnation of CNN

3.3 2006-2011: Revival of CNN

3.4 2012-2014: Rise of CNN

3.5 2015-Present: Rapid increase in Architectural Innovations and Applications of CNN

總結