:)xception_Xception:认识Xtreme盗梦空间
:)xception
Xception簡介 (Intro to Xception)
Xception - the Extreme Inception! Sounds cool and Xtreme! But “Why such a name?”, one might wonder! One obvious thing is that the author Francois Chollet (creator of Keras) had been inspired by the Inception architecture. He tells about how he sees the Inception architecture in his abstract, which I’ve quoted below.
Xception-極限盜版! 聽起來很酷,而且Xtreme! 但是“為什么這樣一個名字?”,一個人可能會好奇! 一個顯而易見的事情是作家Francois Chollet(Keras的創建者)受到了Inception體系結構的啟發。 他在摘要中講述了他如何看待Inception架構,下面將對此進行引用。
We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation
我們將卷積神經網絡中的Inception模塊解釋為是常規卷積和深度可分離卷積運算之間的中間步驟
-Francois Chollet in the Xception paper
-Xception論文中的Francois Chollet
Another new deep learning term for today, “depthwise separable convolution”. Now, I know that most of you would’ve guessed that it’s some kind of layer. But for those who have never heard it, fret not. I’ve got you covered in this post. We’ll go deeper about what’s a depthwise separable convolution and how it’s used to build the Xception model. And also more on how it builds upon the Inception hypothesis. As always, I’ll try to throw in more illustrations to make the details clear. Let’s dive in!
今天,另一個新的深度學習術語是“ 深度可分離卷積 ”。 現在,我知道你們中的大多數人都已經猜到這是某種層次。 但是對于那些從未聽過的人,不用擔心。 我已經在這篇文章中涵蓋了您。 我們將深入探討什么是深度可分離卷積以及如何將其用于構建Xception模型。 還有更多關于它如何建立在初始假設上的信息。 與往常一樣,我將嘗試添加更多插圖以使細節清晰。 讓我們潛入吧!
初始假設 (The Inception Hypothesis)
The Inception-style architecture was introduced in the “Going Deeper With Convolution” paper. The authors called the model introduced in the paper as GoogLeNet, which used the Inception blocks. It was a novel and innovative architecture and it still is. Also, it got much attention as many architectures at that time were stacks of more and more layers to increase network capacity. Inception, on the other hand, was more creative and slick!
Inception風格的體系結構已在“ 通過卷積進行更深入的研究 ”一文中進行了介紹。 作者將本文介紹的模型稱為GoogLeNet,該模型使用了Inception塊。 這是一個新穎的創新架構,現在仍然如此。 此外,它受到了很多關注,因為當時許多體系結構都是越來越多的層的堆棧,以增加網絡容量。 另一方面,Inception更具創造力和技巧!
Rather than just going deeper by simply adding more layers, it also went wide. We’ll see what I mean by “wide” shortly. The Inception blocks take in an input tensor and perform a combination of convolution and pooling in parallel.
它不僅可以通過簡單地添加更多的層來進一步深入,而且還可以擴展。 我們很快就會看到“寬”的意思。 初始塊接收輸入張量,并并行執行卷積和池化的組合。
Image by author圖片作者For people who have seen or read the Inception papers, you might find that this is not exactly like an Inception block. Yeah, you’re right! I’ve just illustrated it this way so that everybody gets a rough idea of what it does. You can call it “the naive version of inception” as the authors called it.
對于看過或閱讀《盜夢空間》論文的人,您可能會發現這并不完全像《盜夢空間》一樣。 是啊,你說得對! 我只是以這種方式進行了說明,以便每個人都能大致了解其功能。 您可以像作者所說的那樣將其稱為“初始的原始版本”。
Now, the actual Inception block is a little bit different in terms of the number of convolutions, their size, and how they’re layered. But this naive illustration conveys what I meant by “wide” before. The block performs convolution with different filter sizes in parallel. And the output tensors are concatenated along the channel dimension i.e stacked one behind the other.
現在,實際的Inception塊在卷積數量,卷積大小以及如何分層方面有所不同。 但是這個天真的例子傳達了我之前所說的“寬泛”。 該塊并行執行具有不同濾波器大小的卷積。 輸出張量沿著通道尺寸連接,即一個堆疊在另一個堆疊在一起。
更深入一點 (Going a Bit Deeper)
Now that you’ve seen the parallel convolution block, I shall go a bit deeper about the block above. The input tensor which is processed by the different convolution would be of shape (BatchSize, Height, Width, Channels). In the Inception block, the input tensor’s channel dimension is reduced using 1x1 convolution before applying 3x3 or 5x5 convolutions. This reduction in channel size is done to reduce the computation when feeding this tensor to the subsequent layers. You can find this concept explained in great detail in the 1x1 convolution article.
既然您已經看到了并行卷積塊,那么我將對上面的塊進行更深入的介紹。 由不同卷積處理的輸入張量將具有形狀(BatchSize,Height,Width,Channels)。 在Inception塊中,在應用3x3或5x5卷積之前,使用1x1卷積減小輸入張量的通道尺寸。 這樣做是為了減小通道大小,以減少在將該張量饋送到后續層時的計算量。 您可以在1x1卷積文章中找到詳細解釋的概念。
Image by author圖片作者Again, this is just another depiction to understand and is not drawn to be the same as the original Inception block. You can give or take some layers, maybe even wider, and make your own version. The input tensor is individually processed by the three convolution towers. And the three separate output tensors are concatenated along the channel dimension. The GoogLeNet uses multiple Inception block and a few other tricks and tweaks to achieve its performance. I believe, now you get the idea of what an Inception block does.
同樣,這只是要理解的另一種描述,并不與原始的Inception塊相同。 您可以賦予或采用一些甚至更廣泛的層次,然后制作自己的版本。 輸入張量由三個卷積塔分別處理。 并將三個單獨的輸出張量沿著通道尺寸連接在一起。 GoogLeNet使用多個Inception塊以及其他一些技巧和調整來實現其性能。 我相信,現在您已經了解了Inception塊的功能。
Next, we shall remodel the above block to make it “Xtreme”!
接下來,我們將對上面的塊進行重塑以使其成為“ Xtreme”!
使其成為Xtreme (Making it Xtreme)
We’ll replace the three separate 1x1 layers in each of the parallel towers, with a single layer. It’ll look something like this.
我們將用單個層替換每個并行塔中的三個單獨的1x1層。 它看起來像這樣。
Inception block with a common 1x1 layer - Image by author具有常見1x1層的起始塊-作者提供的圖像Rather than just passing on the output of the 1x1 layer to the following 3x3 layers, we’ll slice it and pass each channel separately. Let me illustrate with an example. Say, the output of the 1x1 layer is of shape (1x5x5x5). Let’s not consider the batch dimension and just see it as a (5x5x5) tensor. This is sliced along the channel dimension as shown below and fed separately to the following layers.
我們不僅將1x1層的輸出傳遞到后續的3x3層,還對其進行切片并分別傳遞每個通道。 讓我舉例說明。 假設1x1層的輸出是形狀(1x5x5x5)。 我們不要考慮批處理尺寸,而只是將其視為(5x5x5)張量。 如下圖所示,將其沿通道尺寸切成薄片 ,并分別輸送到以下各層。
Slicing of the output tensor of 1x1 convolution along the channel dimension - Image by author沿通道維度對1x1卷積的輸出張量進行切片-圖片由作者Now, each of the slices is passed on to a separate 3x3 layer. This means that each 3x3 block will have to process a tensor of shape (5x5x1). Therefore, there’ll be 5 separate convolution blocks, one for each slice. And each of the convolution blocks will just have a single filter.
現在,每個切片將傳遞到單獨的3x3層。 這意味著每個3x3塊都必須處理一個形狀為(5x5x1)的張量。 因此,將有5個獨立的卷積塊,每個切片一個。 每個卷積塊都只有一個過濾器。
Image by author作者提供的圖像This same process scales up to bigger input tensors. If the output of the pointwise convolution is (5x5x100), there’d be 100 convolution blocks each with one filter. And all of their output would be concatenated at last. This way of doing convolution is why it’s named EXTREME as each channel of the input tensor is processed separately.
相同的過程可以擴展到更大的輸入張量。 如果逐點卷積的輸出為(5x5x100),則將有100個卷積塊,每個塊都有一個濾波器。 它們的所有輸出最終將被串聯在一起。 這種進行卷積的方法就是為什么將其命名為EXTREME的原因,因為輸入張量的每個通道都是單獨處理的。
We have seen enough about the idea of Inception and an Xtreme version of it too. Now, let’s venture further to see what powers the Xception architecture.
我們已經對Inception及其Xtreme版本有了足夠的了解。 現在,讓我們進一步冒險看看是什么為Xception架構提供了動力。
深度可分離卷積:Xception的能量 (Depthwise Separable Convolution: That which powers Xception)
The depthwise separable convolution layer is what powers the Xception. And it heavily uses that in its architecture. This type of convolution is similar to the extreme version of the Inception block that we saw above. But differs slightly in its working. Let’s see how!
深度可分離卷積層為Xception提供動力。 它在其架構中大量使用了它。 這種類型的卷積類似于我們在上面看到的Inception區塊的極端版本??。 但其工作方式略有不同。 讓我們看看如何!
Consider a typical convolution layer with ten 5x5 filters operating on a (1x10x10x100) tensor. Each of the ten filters is of shape (5x5x100) and slides over the input tensor to produce the output. Each of the 5x5 filters covers the whole channel dimension (the entire 100) as it slides over the input. This means a typical convolution operation encompasses both the spatial (height, width) and the channel dimensions.
考慮一個典型的卷積層,其中有十個在(1x10x10x100)張量上運行的5x5濾波器。 十個濾波器中的每一個都具有形狀(5x5x100),并在輸入張量上滑動以產生輸出。 每個5x5過濾器會在輸入上滑動時覆蓋整個通道尺寸(整個100)。 這意味著典型的卷積運算同時包含空間(高度,寬度)和通道尺寸。
Image by author作者authorIf you’re not familiar with convolutions, I suggest you go through How Convolution Works?
如果您對卷積不熟悉,建議您閱讀卷積的工作原理。
A depthwise separable layer has two functional parts that split the job of a conventional convolution layer. The two parts are depthwise convolution and pointwise convolution. We’ll go through them one by one.
深度可分離層具有兩個功能部分,可拆分常規卷積層的工作。 這兩個部分是深度卷積和點卷積。 我們將一一介紹。
深度卷積 (Depthwise Convolution)
Let’s take an example of a depthwise convolution layer with 3x3 filters that operate on an input tensor of shape (1x5x5x5). Again, let’s lose the batch dimension for simplicity as it doesn’t change anything and consider it as a (5x5x5) tensor. Our depthwise convolution will have five 3x3 filters one for each channel of the input tensor. And each filter will slide spatially through a single channel and generate the output feature map for that channel.
讓我們以具有3x3濾鏡的深度卷積層為例,該濾鏡在形狀為(1x5x5x5)的輸入張量上運行 。 再次,為了簡單起見,我們放棄了批處理維度,因為它不會改變任何內容,并將其視為(5x5x5)張量。 我們的深度卷積將有五個3x3濾波器,用于輸入張量的每個通道。 每個濾鏡將在單個通道中在空間上滑動,并生成該通道的輸出特征圖。
As the number of filters is equal to the number of channels of the input, the output tensor will also have the same number of channels. Let’s not have any zero paddings in the convolution operation and keep the stride as 1.
由于濾波器的數量等于輸入的通道數,因此輸出張量也將具有相同的通道數。 讓我們在卷積運算中沒有任何零填充,并將步幅保持為1。
Image by author作者提供的圖像Going by the formula for the output size after convolution, our (5x5x5) will become a (3x3x5) tensor. The illustration below will make the idea clear!
根據卷積后輸出大小的公式,我們的(5x5x5)將成為(3x3x5)張量。 下圖將使這個想法更清楚!
Illustration of Depthwise Convolution Operation - Image by author深度卷積運算的插圖-作者提供的圖像That’s Depthwise Convolution for you! You can see that it’s almost similar to the way we did the Xtreme convolution in the Inception.
這就是您的深度卷積! 您會看到它幾乎與我們在Inception中進行Xtreme卷積的方式相似。
Next up, we have to feed this output tensor to a pointwise convolution which performs cross-channel correlation. It simply means that it operates across all the channels of the tensor.
接下來,我們必須將此輸出張量饋入一個執行跨通道相關的逐點卷積。 它只是意味著它跨張量的所有通道運行。
點向卷積 (Pointwise Convolution)
The pointwise convolution is just another name for a 1x1 convolution. If we ever want to increase or decrease the depth (channel dimension) of a tensor, we can use a pointwise convolution. That’s why it was used in the Inception block to reduce the depth before the 3x3 or 5x5 layers. Here, we’re gonna use it to increase the depth. But how?
點式卷積只是1x1卷積的另一個名稱。 如果我們想增加或減少張量的深度(通道尺寸),可以使用逐點卷積。 這就是為什么在Inception塊中使用它來減小3x3或5x5層之前的深度的原因。 在這里,我們將使用它來增加深度。 但是如何?
The pointwise convolution is just a normal convolution layer with a filter size of one (1x1 filters). Therefore, it doesn’t change the spatial output size after convolution. In our example, the output tensor of the depthwise convolution has a size of (8x8x5). If we apply 50 1x1 filters, we’ll get the output as (8x8x50). And RELU activation is applied in the pointwise convolution layer.
逐點卷積只是一個濾鏡大小為1 ( 1x1濾鏡)的普通卷積層。 因此,它不會在卷積后更改空間輸出大小。 在我們的示例中,深度卷積的輸出張量的大小為(8x8x5) 。 如果我們應用50個1x1濾鏡,則輸出將為(8x8x50) 。 RELU激活應用于逐點卷積層。
See Pointwise Convolution for more detailed illustrations and its advantages.
有關更詳細的插圖及其優點,請參見點向卷積 。
Combining the depthwise convolution and pointwise convolution, we get the Depthwise Separable Convolution. Let’s just call it DSC from here.
結合深度方向卷積和點方向卷積,我們得到了深度方向可分離卷積。 讓我們從這里開始稱之為DSC 。
DSC和Xtreme Inception之間的差異 (Differences between DSC and the Xtreme Inception)
In the Inception block, first comes the pointwise convolution followed by the 3x3 or 5x5 layer. Since we’d be stacking the DSC blocks on above the other, the order doesn’t matter much. The Inception block applies an activation function on both pointwise and the following convolution layers. But in the DSC, it’s just applied once, after the pointwise convolution.
在Inception塊中,首先是逐點卷積,然后是3x3或5x5層。 由于我們將DSC塊堆疊在另一個之上,因此順序并不重要。 初始塊在逐點和隨后的卷積層上都應用激活函數。 但是在DSC中,在逐點卷積之后僅應用了一次。
Image by author圖片作者The Xception author discusses the effect of having activation on both the depthwise and pointwise steps in the DSC. And has observed that learning is faster when there’s no intermediate activation.
Xception的作者討論了激活對DSC中的深度步驟和點方向步驟的影響。 并且已經觀察到沒有中間激活時學習會更快。
Illustration of DSC with and without having an intermediate activation - Image by author帶有和不帶有中間激活的DSC插圖-作者提供的圖像Xception建筑 (Xception Architecture)
The author has split the entire Xception Architecture into 14 modules where each module is just a bunch of DSC and pooling layers. The 14 modules are grouped into three groups viz. the entry flow, the middle flow, and the exit flow. And each of the groups has four, eight, and two modules respectively. The final group, i.e the exit flow, can optionally have fully connected layers at the end.
作者將整個Xception Architecture分為14個模塊,每個模塊只是一堆DSC和池層。 14個模塊分為三組。 入口流,中間流和出口流。 每個組分別具有四個,八個和兩個模塊。 最后一組,即出口流,可以選擇在末端具有完全連接的層。
Note: All the DSC layers in the architecture use a filter size of 3x3, stride 1, and “same” padding. And all the MaxPooling layers use a 3x3 kernel and a stride of 2.
注意:架構中的所有DSC層均使用3x3的濾鏡大小,跨度1和“相同”的填充。 并且所有MaxPooling層都使用3x3內核,步幅為2。
盜用錄入流程 (Entry Flow of Xception)
Image by author作者提供的圖片The above illustration is a detailed version of the one given in the Xception paper. Might seem intimidating at first but look again, it’s very simple.
上面的插圖是Xception論文中給出的詳細版本。 乍一看可能令人生畏,但再看一遍,這很簡單。
The very first module contains conventional convolution layers and they don’t have any DSC ones. They take input tensors of size (-1, 299, 299, 3). The -1 in the first dimension represents the batch size. A negative -1 just denotes that the batch size can be anything.
第一個模塊包含常規卷積層,并且沒有任何DSC層。 它們采用大小為(-1、299、299、3)的輸入張量。 第一維中的-1表示批次大小。 負-1只是表示批處理大小可以是任何值。
And every convolution layer, both conventional and DSC, is followed by a Batch Normalization layer. The convolutions that have a stride of 2 reduces it by almost half. And the output’s shape is shown by the side which is calculated using the convolution formula that we saw before.
常規和DSC的每個卷積層都緊跟著批處理歸一化層。 步長為2的卷積將其減少近一半。 輸出的形狀顯示在側面,該側面是使用我們之前看到的卷積公式計算得出的。
Image by author作者提供的圖像Excluding the first module, all the others in the entry flow have residual skip connections. The parallel skip connections have a pointwise convolution layer that gets added to the output from the main path.
除第一個模塊外,條目流中的所有其他模塊都具有剩余的跳過連接。 并行跳過連接具有逐點卷積層,該層已添加到主路徑的輸出中。
竊聽的中間流程 (Middle Flow of Xception)
Illustration of middle flow - Image by author中間流的插圖-作者提供的圖像In the middle flow, there are eight such modules, one after the other. The above module is repeated eight times to form the middle flow. All the 8 modules in the middle flow use a stride of 1 and don’t have any pooling layers. Therefore, the spatial size of the tensor that’s passed from the entry flow remains the same. The channel depth remains the same too as all the middle flow modules have 728 filters. And that’s the same as the input’s depth.
在中間流程中,有八個這樣的模塊,一個接一個。 重復上述模塊八次以形成中間流程。 中間流中的所有8個模塊都使用跨度1,并且沒有任何池化層。 因此,從輸入流傳遞的張量的空間大小保持不變。 通道深度也保持不變,因為所有中間流模塊都具有728個過濾器。 這與輸入的深度相同。
Xception的退出流程 (Exit Flow of Xception)
Image by author作者提供的圖片The exit flow has just two convolution modules and the second one doesn’t have any skip connection. The second module uses Global Average Pooling, unlike the earlier modules which used Maxpooling. The output vector of the average pooling layer can be fed to a logistic regression layer directly. But we can optionally use intermediate Fully Connected layers too.
出口流只有兩個卷積模塊,而第二個沒有任何跳過連接。 第二個模塊使用全局平均池,與之前使用Maxpooling的模塊不同。 平均池化層的輸出向量可以直接饋送到邏輯回歸層。 但是我們也可以選擇使用中間的全連接層。
總結一下 (To Sum Up)
The Xception model contains almost the same number of parameters as the Inception V3 but outperforms Inception V3 by a small margin on the ImageNet dataset. But it beats Inception V3 with a better margin on the JFT image classification dataset (Google’s internal dataset). Performing better with almost the same number of parameters can be attributed to its architecture engineering.
Xception模型包含的參數數量幾乎與Inception V3相同,但是在ImageNet數據集上的性能略優于Inception V3。 但是在JFT圖像分類數據集(Google的內部數據集)上,它以更好的優勢擊敗了Inception V3。 在幾乎相同數量的參數下,更好的性能可以歸因于其架構工程。
翻譯自: https://towardsdatascience.com/xception-meet-the-xtreme-inception-db569755f4d6
:)xception
總結
以上是生活随笔為你收集整理的:)xception_Xception:认识Xtreme盗梦空间的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 苹果max电池容量多少
- 下一篇: 评估模型如何建立_建立和评估分类ML模型