ICML2018论文研讨会记录
ps:簡單記錄ICML2018論文研討會內容
2018.7.23
Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations. http://proceedings.mlr.press/v80/wang18d/wang18d.pdf
- 零和博弈(GAN受此啟發)和逆強化學習
Learning to Explore via Meta-Policy Gradient. http://proceedings.mlr.press/v80/xu18d/xu18d.pdf
- 元策略梯度
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf
- 值函數分解
2018.7.24
Ray: A Distributed Framework for Emerging AI Applications. Arxiv. https://arxiv.org/pdf/1712.05889.pdf
- 伯克利分布式工具,分享者并沒有講清楚如何部署分布式
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. ICML 2018. http://proceedings.mlr.press/v80/allen-zhu18a/allen-zhu18a.pdf
- 非凸優化
Self-Imitation Learning. ICML 2018. http://proceedings.mlr.press/v80/oh18b/oh18b.pdf
- 自模仿學習
2018.7.27
Mix & Match - Agent Curricula for Reinforcement learning. ICML 2018. http://proceedings.mlr.press/v80/czarnecki18a/czarnecki18a.pdf
- transfer learning用于強化學習
- k越大,模型吸收前面模型的內容越多,訓練復雜度越高
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. ICML 2018. http://proceedings.mlr.press/v80/co-reyes18a/co-reyes18a.pdf
- 類似于VAE用于分層強化學習
State Abstractions for Lifelong Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/abel18a/abel18a.pdf
- 終身強化學習 相當于任務可遷移
2018.7.28
Efficient Neural Architecture Search via Parameter Sharing. ICML 2018. http://proceedings.mlr.press/v80/pham18a/pham18a.pdf
- 在NAS基礎上做改進,對于給定的神經網絡模塊,建立DAG圖,具體算法有待繼續研究
- Google Brain的insight很好,但是還很weak
Implicit Quantile Networks for Distributional Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf
- 本文分成Quantile和Distributional,可以看下作者之前兩篇工作
2018.7.29
Bayesian Optimization of Combinatorial Structures. ICML 2018. http://proceedings.mlr.press/v80/baptista18a/baptista18a.pdf
- 沒聽懂,只是取得部分進展
Visualizing and Understanding Atari Agents. ICML 2018. http://proceedings.mlr.press/v80/greydanus18a/greydanus18a.pdf
- 高斯模糊某一片,看看這塊區域對于Q值的影響
Policy Optimization with Demonstrations. ICML 2018. http://proceedings.mlr.press/v80/kang18a/kang18a.pdf
- 沒怎么聽
2018.7.30
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents. ICML 2018. http://proceedings.mlr.press/v80/zhang18n/zhang18n.pdf
- 智能體之間的通信,隨機選擇子圖,理論早已經弄好,然后實驗簡單設計
Structured Evolution with Compact Architectures for Scalable Policy Optimization. ICML 2018. http://proceedings.mlr.press/v80/choromanski18a/choromanski18a.pdf
- Google brain的,講了一堆矩陣概念,理論解釋不清楚,實驗完備
Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
- 用狀態機完成與環境交互一次,就能完成多任務的reward計算
2018.7.31
Essentially No Barriers in Neural Network Energy Landscape. ICML 2018.
http://proceedings.mlr.press/v80/draxler18a/draxler18a.pdf
- 局部最優點連線
Time Limits in Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/pardo18a/pardo18a.pdf
- 考慮有限步長
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018. http://proceedings.mlr.press/v80/athalye18a/athalye18a.pdf
- best paper對抗的7篇中未被攻克的
2018.8.1
Learning with Abandonment. ICML 2018.
http://proceedings.mlr.press/v80/schmit18a/schmit18a.pdf
- 在推薦系統中用強化學習,設計了一個用戶容忍度theta
Latent Space Policies for Hierarchical Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/haarnoja18a/haarnoja18a.pdf
- 分層強化學習主要是解決解決系數reward或者復雜情況
- 這篇文章文不對標題的分層強化學習
Coordinated Exploration in Concurrent Reinforcement Learning. ICML 2018. http://proceedings.mlr.press/v80/dimakopoulou18a/dimakopoulou18a.pdf
- 提出了seed算法,對比了之前的UCB和辛普森采樣,沒有解釋清楚Concurrent多智能體協同運作
2018.8.2
Clipped Action Policy Gradient. ICML 2018. http://proceedings.mlr.press/v80/fujita18a/fujita18a.pdf
- 求策略梯度的時候用alpha和beta截斷,是無偏估計
An Inference-Based Policy Gradient Method for Learning Options. ICML 2018. http://proceedings.mlr.press/v80/smith18a/smith18a.pdf
- 分層強化學習領域的一篇文章與
- 與ICML2017的A Laplacian Framework for Option Discovery
in Reinforcement Learning算法類似,實驗也有比較
2018.8.3
Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control. ICML 2018. http://proceedings.mlr.press/v80/srinivas18b/srinivas18b.pdf
- 引出對state抽象,做一個model-based,model-based與model-free結合
Investigating Human Priors for Playing Video Games. ICML 2018. http://proceedings.mlr.press/v80/dubey18a/dubey18a.pdf
2018.8.4
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018. http://proceedings.mlr.press/v80/athalye18a/athalye18a.pdf
- ICML2018 best paper
- 原來的ICLR的基于梯度的防御機制主要有三種,分別是梯度破碎,隨機梯度,多輪之后爆炸和消失梯度,一三的對抗方法是找一個不定點可導函數,第二個對抗方法是期望最大化
Addressing Function Approximation Error in Actor-Critic Methods. ICML 2018. http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf
- 狀態抽象,類似于vae,對狀態抽象再還原,最后再最小化作者提出的loss
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation. ICML 2018. http://proceedings.mlr.press/v80/corneil18a/corneil18a.pdf
To be continue。。。。。
總結
以上是生活随笔為你收集整理的ICML2018论文研讨会记录的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 概率论与数理统计中的独立(独立 独立同分
- 下一篇: Java正则脱敏