2019计算机视觉论文精选速递(2019/1/23-2018/1/28)
作者:朱政
原文:CV arXiv Daily:計(jì)算機(jī)視覺(jué)論文每日精選(2019/1/23-2018/1/28)
如有興趣可以**點(diǎn)擊加入極市CV專(zhuān)業(yè)微信群**,獲取更多高質(zhì)量干貨
本系列文章轉(zhuǎn)自計(jì)算機(jī)視覺(jué)牛人朱政大佬的微信公眾號(hào)(CV arxiv Daily),已經(jīng)授權(quán)轉(zhuǎn)載,主要是為了幫大家篩選計(jì)算機(jī)視覺(jué)領(lǐng)域每天的arXiv中的論文,主要關(guān)注領(lǐng)域:目標(biāo)檢測(cè),圖像分割,單/多目標(biāo)跟蹤,行為識(shí)別,人體姿態(tài)估計(jì)與跟蹤,行人重識(shí)別,GAN,模型搜索等。歡迎關(guān)注我,每日會(huì)定時(shí)轉(zhuǎn)發(fā),努力學(xué)習(xí)起來(lái)~
2019/1/28
[1] Google的自監(jiān)督表征學(xué)習(xí)文章
Revisiting Self-Supervised Visual Representation Learning
論文鏈接:https://arxiv.org/abs/1901.09005
代碼地址:https://github.com/google/revisiting-self-supervised
摘要: Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin.
[2] ICLR 2019 GAN文章
Diversity-Sensitive Conditional Generative Adversarial Networks
論文鏈接:https://arxiv.org/abs/1901.09024
摘要: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.
[3] 上交盧策吾老師的Q-learning for斗地主 文章
Combinational Q-Learning for Dou Di Zhu
論文鏈接:https://arxiv.org/abs/1901.08925
代碼地址:https://github.com/qq456cvb/doudizhu-C
摘要: Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an easy-to-use card game environments and train all agents adversarially from sractch, with only knowledge of game rules and verify that our agents are comparative to humans. Our code to reproduce all reported results will be available online.
[4] WACV2019 3D點(diǎn)云 文章
Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
論文鏈接:https://arxiv.org/abs/1901.08906
摘要: Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.
[5] Multi-Target Multi-Camera Tracking 文章
Multiple Hypothesis Tracking Algorithm for Multi-Target Multi-Camera Tracking with Disjoint Views
論文鏈接:https://arxiv.org/abs/1901.08787
摘要: In this study, a multiple hypothesis tracking (MHT) algorithm for multi-target multi-camera tracking (MCT) with disjoint views is proposed. Our method forms track-hypothesis trees, and each branch of them represents a multi-camera track of a target that may move within a camera as well as move across cameras. Furthermore, multi-target tracking within a camera is performed simultaneously with the tree formation by manipulating a status of each track hypothesis. Each status represents three different stages of a multi-camera track: tracking, searching, and end-of-track. The tracking status means targets are tracked by a single camera tracker. In the searching status, the disappeared targets are examined if they reappear in other cameras. The end-of-track status does the target exited the camera network due to its lengthy invisibility. These three status assists MHT to form the track-hypothesis trees for multi-camera tracking. Furthermore, they present a gating technique for eliminating of unlikely observation-to-track association. In the experiments, they evaluate the proposed method using two datasets, DukeMTMC and NLPR-MCT, which demonstrates that the proposed method outperforms the state-of-the-art method in terms of improvement of the accuracy. In addition, they show that the proposed method can operate in real-time and online.
[6] One-Class CNN 文章
One-Class Convolutional Neural Network
論文鏈接:https://arxiv.org/abs/1901.08688
代碼地址:github.com/otkupjnoz/oc-cnn
摘要: We present a novel Convolutional Neural Network (CNN) based approach for one class classification. The idea is to use a zero centered Gaussian noise in the latent space as the pseudo-negative class and train the network using the cross-entropy loss to learn a good representation as well as the decision boundary for the given class. A key feature of the proposed approach is that any pre-trained CNN can be used as the base network for one class classification. The proposed One Class CNN (OC-CNN) is evaluated on the UMDAA-02 Face, Abnormality-1001, FounderType-200 datasets. These datasets are related to a variety of one class application problems such as user authentication, abnormality detection and novelty detection. Extensive experiments demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. The source code is available at : github.com/otkupjnoz/oc-cnn.
[7] In Defense of the Triplet Loss 文章
In Defense of the Triplet Loss for Visual Recognition
論文鏈接:https://arxiv.org/abs/1901.08616
摘要: We employ triplet loss as a space embedding regularizer to boost classification performance. Standard architectures, like ResNet and DesneNet, are extended to support both losses with minimal hyper-parameter tuning. This promotes generality while fine-tuning pretrained networks. Triplet loss is a powerful surrogate for recently proposed embedding regularizers. Yet, it is avoided for large batch-size requirement and high computational cost. Through our experiments, we re-assess these assumptions. During inference, our network supports both classification and embedding tasks without any computational overhead. Quantitative evaluation highlights how our approach compares favorably to the existing state of the art on multiple fine-grained recognition datasets. Further evaluation on an imbalanced video dataset achieves significant improvement (>7%). Beyond boosting efficiency, triplet loss brings retrieval and interpretability to classification models.
2019/1/26
SiamRPN系列文章總結(jié)
[0] SiamFC文章,對(duì)SINT(Siamese Instance Search for Tracking,in CVPR2016)改進(jìn),第一個(gè)提出用全卷積孿生網(wǎng)絡(luò)結(jié)構(gòu)來(lái)解決tracking問(wèn)題的paper,可以視為只有一個(gè)anchor的SiamRPN
論文題目:Fully-convolutional siamese networks for object tracking
論文地址:https://arxiv.org/abs/1606.09549
項(xiàng)目地址:https://www.robots.ox.ac.uk/~luca/siamese-fc.html
tf實(shí)現(xiàn):https://github.com/torrvision/siamfc-tf
pytorch實(shí)現(xiàn):https://github.com/rafellerc/Pytorch-SiamFC
[0.1] 后面的v2版本即CFNet,用cf操作代替了correlation操作。
論文題目:End-To-End Representation Learning for Correlation Filter Based Tracking
論文地址:http://openaccess.thecvf.com/content_cvpr_2017/html/Valmadre_End-To-End_Representation_Learning_CVPR_2017_paper.html
項(xiàng)目地址:http://www.robots.ox.ac.uk/~luca/cfnet.html
MatConvNet實(shí)現(xiàn):https://github.com/bertinetto/cfnet
SiamFC之后有諸多的改進(jìn)工作,例如
[0.2] StructSiam,在跟蹤中考慮Local structures
論文題目:Structured Siamese Network for Real-Time Visual Tracking
論文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Yunhua_Zhang_Structured_Siamese_Network_ECCV_2018_paper.pdf
[0.3] SiamFC-tri,在Saimese跟蹤網(wǎng)絡(luò)中引入了Triplet Loss
論文題目:Triplet Loss in Siamese Network for Object Tracking
論文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Xingping_Dong_Triplet_Loss_with_ECCV_2018_paper.pdf
[0.4] DSiam,動(dòng)態(tài)Siamese網(wǎng)絡(luò)
論文題目:Learning Dynamic Siamese Network for Visual Object Tracking
論文地址:http://openaccess.thecvf.com/content_ICCV_2017/papers/Guo_Learning_Dynamic_Siamese_ICCV_2017_paper.pdf
代碼地址:https://github.com/tsingqguo/DSiam
[0.5] SA-Siam,Twofold Siamese網(wǎng)絡(luò)
論文題目:A Twofold Siamese Network for Real-Time Object Tracking
論文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/He_A_Twofold_Siamese_CVPR_2018_paper.pdf
[1] SiamRPN文章,將anchor應(yīng)用在候選區(qū)域的每個(gè)位置,同時(shí)進(jìn)行分類(lèi)和回歸,one-shot local detection。
論文題目:High Performance Visual Tracking with Siamese Region Proposal Network
論文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf
項(xiàng)目地址:http://bo-li.info/SiamRPN/
[2] DaSiamRPN, SiamRPN文章的follow-up,重點(diǎn)強(qiáng)調(diào)了訓(xùn)練過(guò)程中樣本不均衡的問(wèn)題,增加了正樣本的種類(lèi)和有語(yǔ)義的負(fù)樣本。
論文題目:Distractor-aware Siamese Networks for Visual Object Tracking
論文地址:https://arxiv.org/abs/1808.06048
項(xiàng)目地址:http://bo-li.info/DaSiamRPN/
test code:https://github.com/foolwood/DaSiamRPN
[3] Cascaded SiamRPN,將若干RPN模塊cascade起來(lái),同時(shí)利用了不同layer的feature。
論文題目:Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking
論文地址:https://arxiv.org/abs/1812.06148
[4] SiamMask,在SiamRPN的結(jié)構(gòu)中增加了一個(gè)mask分支,同時(shí)進(jìn)行tracking和video segmentation。
論文題目:Fast Online Object Tracking and Segmentation: A Unifying Approach
論文地址:https://arxiv.org/abs/1812.05050
項(xiàng)目地址:http://www.robots.ox.ac.uk/~qwang/SiamMask/
[5] SiamRPN++, SiamRPN文章的follow-up,讓現(xiàn)代網(wǎng)絡(luò)例如ResNet在tracking中work了,基本上所有數(shù)據(jù)集都是SOTA。
論文題目:SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
論文地址:https://arxiv.org/abs/1812.11703
項(xiàng)目地址:http://bo-li.info/SiamRPN++/
[6] Deeper and Wider SiamRPN,將網(wǎng)絡(luò)加深加寬來(lái)提升性能,重點(diǎn)關(guān)注感受野和padding的影響。
論文題目:Deeper and Wider Siamese Networks for Real-Time Visual Tracking
論文地址:https://arxiv.org/abs/1901.01660
test code:https://gitlab.com/MSRA_NLPR/deeper_wider_siamese_trackers
2019/1/25
[1] Salient Object Detection文章
Deep Reasoning with Multi-scale Context for Salient Object Detection
論文鏈接:https://arxiv.org/abs/1901.08362
[2] 交通場(chǎng)景異常檢測(cè)綜述
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
論文鏈接:https://arxiv.org/abs/1901.08292
[3] 3D目標(biāo)檢測(cè)
3D Backbone Network for 3D Object Detection
論文鏈接:https://arxiv.org/abs/1901.08373
[4] 語(yǔ)義分割文章
Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation
論文鏈接:https://arxiv.org/abs/1901.08394
[5] 目標(biāo)檢測(cè)文章
Object Detection based on Region Decomposition and Assembly
論文鏈接:https://arxiv.org/abs/1901.08225
[6] 牛津的圖卷積網(wǎng)絡(luò)文章
Hypergraph Convolution and Hypergraph Attention
論文鏈接:https://arxiv.org/abs/1901.08150
2019/1/24
[1] 京東PoseTrack2018亞軍方案的技術(shù)報(bào)告
A Top-down Approach to Articulated Human Pose Estimation and Tracking
論文鏈接:https://arxiv.org/abs/1901.07680
[2] 投稿TNNLS網(wǎng)絡(luò)壓縮文章
Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
論文鏈接:https://arxiv.org/abs/1901.07827
代碼:https://github.com/ShaohuiLin/SSR
[3] 港中文&商湯 DeepFashion數(shù)據(jù)集
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
論文鏈接:https://arxiv.org/abs/1901.07973
代碼:https://github.com/switchablenorms/DeepFashion2
[4]目標(biāo)檢測(cè)文章
Bottom-up Object Detection by Grouping Extreme and Center Points
論文鏈接:https://arxiv.org/abs/1901.08043
代碼:https://github.com/xingyizhou/ExtremeNet
2019/1/23
[1] 商湯 COCO2018 檢測(cè)任務(wù)冠軍方案文章
Winning entry of COCO 2018 Challenge (object detection task) Hybrid Task Cascade for Instance Segmentation
https://arxiv.org/abs/1901.07518
[2] 小米用NAS做超分的技術(shù)報(bào)告
Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
https://arxiv.org/abs/1901.07261
[3] 目標(biāo)檢測(cè)文章
Consistent Optimization for Single-Shot Object Detection
https://arxiv.org/abs/1901.06563
[4] 商湯的不均衡樣本分類(lèi)文章
Dynamic Curriculum Learning for Imbalanced Data Classification
https://arxiv.org/abs/1901.06783
[5] 人臉檢測(cè)文章
Improved Selective Refinement Network for Face Detection
https://arxiv.org/abs/1901.06651
[6] 曠視的零售商品數(shù)據(jù)集
RPC: A Large-Scale Retail Product Checkout Dataset
https://arxiv.org/abs/1901.07249
[7] 人體屬性識(shí)別綜述
Pedestrian Attribute Recognition: A Survey
https://arxiv.org/abs/1901.07474
項(xiàng)目地址:https://sites.google.com/view/ahu-pedestrianattributes/
推薦文章
- 目標(biāo)檢測(cè)領(lǐng)域還有什么可做的?19 個(gè)方向給你建議
- ECCV 2018 | CornerNet:目標(biāo)檢測(cè)算法新思路
總結(jié)
以上是生活随笔為你收集整理的2019计算机视觉论文精选速递(2019/1/23-2018/1/28)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 算力限制场景下的目标检测实战浅谈
- 下一篇: CVPR2019 大会信息即时跟进(附C