CVPR2019接收结果公布了,但CVPR 2018的那些论文都怎么样了?
CVPR 作為計算機視覺三大頂級會議之一,一直以來都備受關注。被 CVPR 收錄的論文更是代表了計算機視覺領域的最新發展方向和水平。今年,CVPR 2019 將于美國洛杉磯舉辦,上個月接收結果公布后,又引起了 CV 屆的一個小高潮,一時間涌現出眾多 CVPR 論文的解讀文章。?
根據 CVPR 官網論文列表統計的數據,本年度共有 1300 篇論文被接收,而這個數據在過去 3 年分別為 643 篇(2016)、783 篇(2017)、979 篇(2018)。這從一個方面也說明了計算機視覺這個領域的方興未艾,計算機視覺作為機器認知世界的基礎,也作為最主要的人工智能技術之一,正在受到越來越多的關注。?
全球的學者近期都沉浸在 CVPR 2019 的海量論文中,希望能第一時間接觸到最前沿的研究成果。但在這篇文章里,我們先把 CVPR 2019 的論文放下,一同回首下 CVPR 2018 的論文情況。?
根據谷歌學術上的數據,我們統計出了 CVPR 2018 收錄的 979 篇論文中被引用量最多的前五名,希望能從引用量這個數據,了解到這些論文中,有哪些最為全球的學者們所關注。?
由于不同搜索引擎的引用量數據各不相同,所以我們在這里僅列出了谷歌學術的數據。谷歌的參考文獻是從多個文獻庫,包括大量書籍中篩選的,其引用次數一般僅作為衡量一篇論文重要性的粗略指標。
根據 CVPR 2018 的論文列表(http://openaccess.thecvf.com/CVPR2018.py)在谷歌學術進行搜索,得到數據如下(以 2019 年 3 月 19 日檢索到的數據為準,因第 2 名及第 3 名數據十分接近,不做明確排名) :
“引用”指的是在論文中引述前人的研究成果,是作者表明其方法、觀點和發現來源的標準方式。評價一篇論文的重要性,除了論文是否被頂級會議收錄這一維度,論文的被引數也是不可或缺的維度。雖然引用量具體到不同學科的數據相差很多,但在計算機視覺這一單個學科內,論文的被引用量是評價某篇論文是否得到推崇的重要量化指標。
CVPR 2018 的高被引數論文都是獲得學術界較大關注和推崇的論文,這主要在于他們的開創性。例如,排名第一的 Squeeze-and-Excitation Networks(簡稱 SE-Net)構造就非常簡單,很容易被部署,不需要引入新的函數或者層,并且在模型和計算復雜度上具有良好的特性。
借助 SE-Net,論文作者在 ImageNet 數據集上將 Top-5 error 降低到 2.251%(此前的最佳成績為 2.991%),獲得了 ImageNet 2017 競賽圖像分類的冠軍。在過去一年里,SE-Net 不僅作為業界性能極強的深度學習網絡單元被廣泛使用,也為其他學者的研究提供了參考。?
SE-Net 介紹詳見原作者講解:
此外,還有 Google Brain 帶來的 Learning Transferable Architectures for Scalable Image Recognition,提出了用一個神經網絡來學習另一個神經網絡的結構,也為許多學者所關注。?
以下是 5 篇文章的摘要,以供讀者們回顧:
Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding.?
In this work, we focus on the channel relationship and propose a novel architectural unit, which we term the “Squeeze- and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets.?
Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at a minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ~25% relative improvement over the winning entry of 2016. Code and models are available at https: //github.com/hujie-frank/SENet.
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13× actual speedup over AlexNet while maintaining comparable accuracy.
Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset.?
The key contribution of this work is the design of a new search space (which we call the “NASNet search space”) which enables transferability. In our experiments, we search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, which we name a “NASNet architecture”.?
We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, a NASNet found by our method achieves 2.4% error rate, which is state-of-the-art. Although the cell is not searched for directly on ImageNet, a NASNet constructed from the best cell achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model.?
When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. Finally, the image features learned from image classification are generically useful and can be transferred to other computer vision problems. On the task of object detection, the learned features by NASNet used with the Faster-RCNN framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.?
is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design.?
Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and topdown attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered.?
Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.
點擊以下標題查看更多往期內容:?
CVPR 2019 | 無監督領域特定單圖像去模糊
圖神經網絡綜述:模型與應用
近期值得讀的10篇GAN進展論文
F-Principle:初探理解深度學習不能做什么
萬字綜述之生成對抗網絡(GAN)
讓Keras更酷一些:分層的學習率和自由的梯度
小米拍照黑科技:基于NAS的圖像超分辨率算法
AAAI 2019 | 基于區域分解集成的目標檢測
#投 稿 通 道#
?讓你的論文被更多人看到?
如何才能讓更多的優質內容以更短路徑到達讀者群體,縮短讀者尋找優質內容的成本呢??答案就是:你不認識的人。
總有一些你不認識的人,知道你想知道的東西。PaperWeekly 或許可以成為一座橋梁,促使不同背景、不同方向的學者和學術靈感相互碰撞,迸發出更多的可能性。?
PaperWeekly 鼓勵高校實驗室或個人,在我們的平臺上分享各類優質內容,可以是最新論文解讀,也可以是學習心得或技術干貨。我們的目的只有一個,讓知識真正流動起來。
??來稿標準:
? 稿件確系個人原創作品,來稿需注明作者個人信息(姓名+學校/工作單位+學歷/職位+研究方向)?
? 如果文章并非首發,請在投稿時提醒并附上所有已發布鏈接?
? PaperWeekly 默認每篇文章都是首發,均會添加“原創”標志
? 投稿郵箱:
? 投稿郵箱:hr@paperweekly.site?
? 所有文章配圖,請單獨在附件中發送?
? 請留下即時聯系方式(微信或手機),以便我們在編輯發布時和作者溝通
?
現在,在「知乎」也能找到我們了
進入知乎首頁搜索「PaperWeekly」
點擊「關注」訂閱我們的專欄吧
關于PaperWeekly
PaperWeekly 是一個推薦、解讀、討論、報道人工智能前沿論文成果的學術平臺。如果你研究或從事 AI 領域,歡迎在公眾號后臺點擊「交流群」,小助手將把你帶入 PaperWeekly 的交流群里。
▽ 點擊 |?閱讀原文?| 獲取最新論文推薦
總結
以上是生活随笔為你收集整理的CVPR2019接收结果公布了,但CVPR 2018的那些论文都怎么样了?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 本周值得读的15篇AI论文,还有源码搭配
- 下一篇: 小样本学习(Few-shot Learn