 October 25, 2016 at 10:10
 October 25, 2016 at 10:10Deep Learning: Comparison of frameworks for symbolic deep learning
- Transfer
We present you a translation of a series of articles on deep learning. The first part describes the choice of an open source framework for symbolic deep learning, between MXNET, TensorFlow, Theano. The author compares in detail the advantages and disadvantages of each of them. In the following parts, you will learn about fine tuning deep convolutional networks, as well as about combining a deep convolutional neural network with a recurrent neural network.

1. Comparison of frameworks for symbolic deep learning .
2. Transfer learning and fine-tuning of deep convolutional neural networks .
3. The combination of a deep convolutional neural network with a recurrent neural network .
Note: further the narration will be conducted on behalf of the author.
Character computation frameworks ( MXNET , TensorFlow , Theano ) are characterized by symbolic graphs of vector operations, such as matrix addition / multiplication or convolution. A layer is simply a collection of such operations. Thanks to the division into small composite components (operations), users can create new complex types of layers without the use of low-level languages (as in Caffe ).
I have experience using various frameworks for symbolic computing. As it turned out, in the device and the current implementation, they all have both advantages and disadvantages, but none of them fully meets all the requirements. However, at the moment I prefer Theano.
Next, we compare the listed frameworks for symbolic computations.
Benefits:
Disadvantages:
Benefits:
Disadvantages:
In all of these frameworks, adding operations while maintaining acceptable performance is not easy.
Learning deep networks takes a lot of time. Therefore, Caffe released several pre-trained models (model zoo) that could be used as initial samples in the transfer of training or in fine-tuning deep networks for certain areas of knowledge or user images.
A fairly effective implementation of low-level operators: they can be used as composite components when creating new models without spending effort on writing new operators.
Flow control operators enhance the expressiveness and versatility of the character system.
In my tests, the performance of the LeNet model for the MNIST dataset was measured for a single GPU configuration (NVIDIA Quadro K1200 GPU).
The memory capacity of the GPU is limited, so use for large models can be problematic.
Theano has been compiling graphs for a very long time, especially in complex models. TensorFlow is a little slower.
Theano (with high-level solutions from Lasagne and Keras) is an excellent choice for deep learning models. Using Lasagne / Keras is very easy to create new networks and modify existing ones. I prefer Python, so I choose Lasagne / Keras due to the very developed Python interface. However, these solutions do not support R. The opportunities for transferring training and fine tuning in Lasagne / Keras show that it is very easy to modify existing networks there, as well as configure them for subject-oriented user data.
After comparing the frameworks, we can conclude that MXNET (higher performance, efficient memory use) will be the best solution. In addition, it has excellent support for R. Actually, this is the only platform that supportsall functions are on R. In MXNET, transferring training and fine-tuning networks is possible, but it is rather difficult to perform them (compared to Lasagne / Keras). Because of this, it will be difficult not only to modify existing training networks, but also to configure them for subject-oriented user data.
If you see an inaccuracy in the translation, please report this in private messages.

Series of articles "Deep Learning"
1. Comparison of frameworks for symbolic deep learning .
2. Transfer learning and fine-tuning of deep convolutional neural networks .
3. The combination of a deep convolutional neural network with a recurrent neural network .
Note: further the narration will be conducted on behalf of the author.
Character frameworks
Character computation frameworks ( MXNET , TensorFlow , Theano ) are characterized by symbolic graphs of vector operations, such as matrix addition / multiplication or convolution. A layer is simply a collection of such operations. Thanks to the division into small composite components (operations), users can create new complex types of layers without the use of low-level languages (as in Caffe ).
I have experience using various frameworks for symbolic computing. As it turned out, in the device and the current implementation, they all have both advantages and disadvantages, but none of them fully meets all the requirements. However, at the moment I prefer Theano.
Next, we compare the listed frameworks for symbolic computations.
| Characteristic | Theano | Tensorflow | MXNET | 
|---|---|---|---|
| Software | Theano | Tensorflow | MXNET | 
| Author | University of Montreal | The Google Brain Team | Distributed (Deep) Machine Learning Community | 
| Software license | BSD License | Apache 2.0 | Apache 2.0 | 
| Open source | Yes | Yes | Yes | 
| Platform | Cross platform solution | Linux, Mac OS X, planned support for Windows | Ubuntu, OS X, Windows, AWS, Android, iOS, JavaScript | 
| Programming language | Python | C ++, Python | C ++, Python, Julia, Matlab, R, Scala | 
| Interface | Python | C / C ++, Python | C ++, Python, Julia, Matlab, JavaScript, R, Scala | 
| CUDA Support | Yes | Yes | Yes | 
| Automatic differentiation | Yes | Yes | Yes | 
| Presence of pre-trained models | Using model zoo in Lasagne | Not | Yes | 
| Recurrent networks | Yes | Yes | Yes | 
| Convolution Networks | Yes | Yes | Yes | 
| Limited Boltzmann Machines / Deep Trust Networks | Yes | Yes | Yes | 
Comparison of character and non-character frameworks
Non-character frameworks
Benefits:
- Non-symbolic (imperative) frameworks of neural networks, such as torch and caffe , as a rule, have a very similar computing device.
- From the point of view of expressiveness, imperative frameworks are arranged quite well, they can have a graph-based interface (for example, torch / nngraph ).
Disadvantages:
- The main drawback of imperative frameworks is manual optimization. For example, on-site operations need to be implemented manually.
- Most imperative frameworks lose symbolic in expressiveness.
Symbolic frameworks
Benefits:
- In symbolic frameworks, automatic optimization based on dependency graphs is possible.
- In character frameworks, you can get much more memory reuse features. For example, it is perfectly implemented in MXNET.
- Character frameworks can automatically calculate the optimal schedule. Learn more here .
Disadvantages:
- Available open-source character frameworks are still underdeveloped and inferior to performance imperatives.
Adding New Operations
In all of these frameworks, adding operations while maintaining acceptable performance is not easy.
| Theano / MXNET | Tensorflow | 
|---|---|
| You can add Python operations with support for C built-in operators. | Forward in C ++, symbolic gradient in Python. | 
Code reuse
Learning deep networks takes a lot of time. Therefore, Caffe released several pre-trained models (model zoo) that could be used as initial samples in the transfer of training or in fine-tuning deep networks for certain areas of knowledge or user images.
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Lasagne is a high-level platform based on Theano. Lasagne makes it easy to use pre-trained Caffe models | No support for pre-trained models | The tool is provided MXNET caffe_converter , for converting pre-trained models caffe MXNET format | 
Tensor Low Level Operators
A fairly effective implementation of low-level operators: they can be used as composite components when creating new models without spending effort on writing new operators.
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Many simple operations | Quite good | Very little | 
Flow control statements
Flow control operators enhance the expressiveness and versatility of the character system.
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Are supported | In experiment format | Not supported | 
High level support
| Theano | Tensorflow | MXNET | 
|---|---|---|
| A "clean" symbolic computing framework. You can create high-level platforms as required. Successful examples include Keras , Lasagne , blocks | A good device from the point of view of training neural networks, but at the same time, this framework is not exclusively focused on neural networks, which is very good. You can use graph collections , queues, and image additions as components for high-level shells | In addition to the symbolic part, MXNET also provides all the necessary components for classifying images, from loading data to building models with methods for starting training. | 
Performance
Single-GPU Performance Measurement
In my tests, the performance of the LeNet model for the MNIST dataset was measured for a single GPU configuration (NVIDIA Quadro K1200 GPU).
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Excellent | Medium | Fine | 
Memory
The memory capacity of the GPU is limited, so use for large models can be problematic.
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Excellent | Medium | Fine | 
Single-GPU Speed
Theano has been compiling graphs for a very long time, especially in complex models. TensorFlow is a little slower.
| Theano / MXNET | Tensorflow | 
|---|---|
| Compared to CuDNNv4 | About twice as slow | 
Support for parallel and distributed computing
| Theano | Tensorflow | MXNET | 
|---|---|---|
| Experimental Multi-GPU Support | Multi GPU | Distributed | 
Conclusion
Theano (with high-level solutions from Lasagne and Keras) is an excellent choice for deep learning models. Using Lasagne / Keras is very easy to create new networks and modify existing ones. I prefer Python, so I choose Lasagne / Keras due to the very developed Python interface. However, these solutions do not support R. The opportunities for transferring training and fine tuning in Lasagne / Keras show that it is very easy to modify existing networks there, as well as configure them for subject-oriented user data.
After comparing the frameworks, we can conclude that MXNET (higher performance, efficient memory use) will be the best solution. In addition, it has excellent support for R. Actually, this is the only platform that supportsall functions are on R. In MXNET, transferring training and fine-tuning networks is possible, but it is rather difficult to perform them (compared to Lasagne / Keras). Because of this, it will be difficult not only to modify existing training networks, but also to configure them for subject-oriented user data.
If you see an inaccuracy in the translation, please report this in private messages.