各种流行深度学习构架的性能对比
知乎上對各種深度學習方法的對比:
??? ? ? 在眾多的神經網絡框架如chainer, caffe, torch,mxnet等之間如何做選擇?
四個月前就有人提出更新對比,現在我看還沒有對比更新過。
??????? Evaluation of Deep Learning Toolkits
原文:
Abstract. In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order:Caffe,CNTK,TensorFlow,Theano, andTorch. This is a dynamic document and the evaluation, to the best of my knowledge, is based on the current state of their code.
I also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].
If you find something wrong or inadequate, please help improve by filing an issue.
????? 本文對?Caffe,CNTK,TensorFlow,Theano, andTorch. 幾種框架進行對比,如有錯誤,敬請指正!
Table of contents
?????????
一、Modeling Capability-兼容性
???
In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks without writing too much code. Some of these networks are:
- ConvNets: AlexNet, OxfordNet, GoogleNet
- RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
- Sequential modeling with attention.
In addition, we also evaluate the flexibility to create a new type of model.
??????? 模型 相容性: ??? 在此章節中,評價每個工具箱 在不修改更多代碼的情況下 訓練通用和日新月異的網絡的能力。
一些網絡為:
?????? 卷積神經網絡: AlexNet, OxfordNet, GoogleNet
?????? 遞歸神經網路 :? plain RNN, LSTM/GRU, bidirectional RNN
?????? 注意力 序列模型
Caffe
??????? Caffe 作為 社區和業界最為流行的深度神經網絡,具有很強的伸縮性、擴展性和相容性;但是對遞歸神經網絡的支持比較貧乏。
Caffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013, due to its excellent convnet implementation (at the time). It is still the most popular toolkit within the computer vision community, with many extensions being actively added.
However, its support for recurrent networks and language modeling in general is poor, due to its legacy architecture, which's limitations are detailed in thearchitecture section.
CNTK
???????? CNTK在speech社區更為流行。在CNTK(如 TensorFlow 和 Theano ),網絡作為一個向量操作圖,栗如 矩陣 加和乘。一個層是這種運算的組合。buildding blocks 的微調粒度允許 在不執行底層的情況下 創建一個更復雜的層。
CNTK is a deep learning system started by the speech people whostarted the deep learning craze and grown into a more general platform-independent deep learning system. It is better known in the speech community than in the general deep learning community.
In CNTK (as in TensorFlow and Theano), a network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).
As of today, CNTK is not usable for a variety of tasks such as sequence-2-sequence.
TensorFlow
?????? tensorflow 是一個較新的網絡,對RNN的表示較為容易且有效(使用桶的方法 );特點:RNN API、次最優執行;雙向RNN;暫時沒有適用于視頻的3D卷積。
?????? 每一個計算流被構建為一個靜態圖,這會使一些計算困難,比如 柱搜索 方法(常用于序列預測任務的方法)。
State-of-the-art models
- RNN API and implementation are suboptimal. The team also commented about ithere andhere.
- Bidirectional RNN not available yet
- No 3D convolution, which is useful for video recognition
New modelsSince TF uses symbolic graph of vector operations approach, specifying a new network is fairly easy. Although it doesn't support symbolic loop yet (at least not well tested/documented, as of 05/2016), RNNs can be made easy and efficient using the bucketing trick.
However, TF has a major weakness in terms of modeling flexibility. Every computational flow has be constructed as a static graph. That makes some computations difficult, such asbeam search (which is used frequently in sequence prediction tasks).
Theano
? ?? Theano:較新的框架結構,一般以高層的構架運行或者一純Theano運行;
? ?? 新的模型:Theano倡導使用符號圖表 運行網絡,其符號API支持 環控制--成為 搜索,這種方法使RNN執行變得容易且有效;
State-of-the-art models. Theano has implementation for most state-of-the-art networks, either in the form of a higher-level framework (e.g.Blocks,Keras, etc.) or in pure Theano.
New models. Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-calledscan, which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler.
Torch
State-of-the-art models
- Excellent for conv nets. It's worth noting that temporal convolution can be done in TensorFlow/Theano viaconv2d but that's a trick. The native interface for temporal convolution in Torch makes it slightly more intuitive to use.
- Rich set of RNNs available through anon-official extension [2]
New models. In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is sometimes considered less flexible because for new layer types, users have to implement the full forward, backward, and gradient input update.
However, unlike Caffe, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via Protobuf.
Torch is more flexible than TensorFlow and Theano in that it is imperative while TF/Theano are declarative (i.e. one has to declare a computational graph). That makes some operations, e.g. beam search, much easier to do in Torch.
????? Torch在CNN網絡方面做的極為優秀,在2維卷積網絡方面使用的更為直觀。?? 與caffe不同的是,Torch更容易構建網絡,因為構建新層不涉及C++的執行。因此,使得網絡和層的定義可以占比重較小。而Caffe定義網絡:每一層使用C++定義,整個網絡配置則使用Protobuf文件。
????? TF/Theano are declarative使用陳述時語言(語法圖),而命令式語言的Torch則顯得擴展性更強。使得一些方法如柱搜索 更加容易。
Left: graph model of CNTK/Theano/TensorFlow; Right: graph model of Caffe/Torch
二、Interfaces--接口
Caffe
Caffe has pycaffe interface but that's a mere secondary alternative to the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you usepycaffe.
此外,Caffe提供了Python的接口,可以使用命令式語言逐步執行。
CNTK
The way to use CNTK, similar to Caffe, is to specify a config file and run command line. CNTK is slightly worse than Caffe because there's no Python or any other high-level language interface.
CNTK與Caffe類似,提供了命令行執行接口,但糟糕的是沒有提供Python和其他高級語言的接口。
TensorFlow
TF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency.
It would be perfect if TF supports F# or TypeScript. The lack of static type in Python is just ... painful :).
TensorFlow 有C++和Python的接口,這意味著可以使用高級腳本語言執行,并能兼顧運行效率。
TensorFlow支持F# 就是腦殘了!!!
Theano
Python
Torch
Torch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++/C#/Java). Hence developers don't have to think about symbolic programming, which can be limited. They can just write all kinds of computations without worrying about performance penalty.
However, let's face it, Lua is not yet a mainstream language.
Torch使用了LUA腳本語言接口。
三、Model Deployment--模型部署 難易度
How easy to deploy a new model?
Caffe
Caffe is C++ based, which can be compiled on a variety of devices. It is cross-platform (windows port is available and maintainedhere). Which makes Caffe the best choice with respect deployment.???
基于C++的特性,部署起來還是較為困難的。
CNTK
Like Caffe, CNTK is also C++ based and is cross-platform. Hence, deployment should be easy in most cases. However, to my understanding, it doesn't work on ARM architecture, which limits its its capability on mobile devices.?
CNTK更甚,甚至不能運行在ARM平臺上。
TensorFlow
TF supports C++ interface and the library can be compiled/optimized on ARM architectures because it usesEigen (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [3].
TF doesn't work on Windows yet so TF models can't be deployed on Windows devices though.
TensorFlow不能用于windows,因此不能用于windows設備。
Theano
The lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there.
The cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points.
Torch
Torch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of Caffe/CNTK/TF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a larger production pipeline.
Torch要求LUA的JIT編譯器,這樣使Torch在效率上低于支持C++的Caffe/CNTK/TF。
四、Performance??性能表現
Single-GPU?? 在單GPU上的表現
All of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly.
Soumith@FB has done some benchmarking for ConvNets. Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark.
TensorFlow and Torch
可以在一個 TitanX GPU上運行 的TensorFlow...表現如下表:
TensorFlow used to be slow when it first came out but as of 05/2016, it has reached the ballpark of other frameworks in terms of ConvNet speed. This is not surprising because every framework nowadays calls CuDNN for the actual computations.
Here's my latest micro benchmark of TensorFlow 0.8 vs before. The measurement is latency, in milliseconds, for one full minibatch forward-backward pass on a single Titan X GPU.
| AlexNet | 292 | 97 | 81 |
| Inception v1 | 1237 | 518 | 470 |
Theano
在大型的網絡中,Theano的表現
此外,Theano可以使用CUDA本地代碼,單機并行執行.....
On big networks, Theano’s performance is on par with Torch7, according to this benchmark. The main issue of Theano is startup time, which is terrible, because Theano has to compile C/CUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls.
Even import theano takes time because thisimport apparently does a lot of stuffs. Also, afterimport Theano, you are stuck with a pre-configured device (e.g.GPU0).
Multi-GPU-分布式多GPU的表現
TBD
五、Architecture-結構
Developer Zone
Caffe
逐層初始化的設計,網絡塊的構建基礎是層。?? Caffe趨向于標準模型的使用,使用C++構建層,并使用逐層初始化方式,并使用了protobuf作為配置接口。
????? 對于新的層,必須定義 前向-后向和梯度更新規則。可以參考...
????? 對于GPU和CPU的切換,你可能必須要更改代碼支持(貌似可以不這樣)......
????? 更糟糕的是:必須對每一層的功能ID進行清晰的定義,若過早合并,可能發生沖突.....
???? (雖然如此,Caffe依然是兼顧效率和易用度最適合的用于CNN的網絡...)
Protobuf. 配置接口...
Caffe's architecture was considered excellent when it was born but in the modern standard, it is considered average. The main pain points of Caffe are its layer-wise design in C++ and the protobuf interface for model definition.
Layer-wise design. The building block of a network in Caffe is layer.
- For new layer types, you have to define the full forward, backward, and gradient update. You can see an alreadylong-list of layers implemented in (official) caffe.
- What's worse is that if you want to support both CPU and GPU, you need to implement extra functions, e.g.Forward_gpu and Backward_gpu.
- Worse, you need to assign an int id to your layer type and add that to theproto file. If your pull request is not merged early, you may need to change the id because someone else already claims that.
Protobuf. Caffe haspycaffe interface but that's a mere replacement of the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you usepycaffe.
[Copied from my own answer on Quora]
CNTK
To be updated ...
TensorFlow
TF has a clean, modular architecture with multiple frontends and execution platforms. Details are in thewhite paper.
Theano
The architecture is fairly hacky: the whole code base is Python where C/CUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers.
Torch
Torch7 and nn libraries are also well-designed with clean, modular interfaces.
Ecosystem
- Caffe and CNTK: C++
- TensorFlow: Python and C++
- Theano: Python
- Torch: Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python.
Cross-platform
Caffe, CNTK, and Theano work on all OSes. TensorFlow and Torch do not work on Windows and there's no known plan to port from either camp.
綜述:(與原作者無關)
?????????? 對于個人實驗者極力推崇 Caffe 用于 CNN;另外使用RNN的科學工作者,推薦使用 Theano 和 TensorFlow。
?????????? 使用并行分布式的系統,推薦CNN使用Caffe,而RNN使用TensorFlow。
Karpathy的評價:
Footnotes
[1] Note that I don’t aggregate ratings because different users/developers have different priorities.
[2] Disclaimer: I haven’t analyzed this extension carefully.
[3] See my blog post for why this is desirable.
總結
以上是生活随笔為你收集整理的各种流行深度学习构架的性能对比的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 深度学习:网络的编程模式比较
- 下一篇: DDR SDRAM内存发展历程