Deep Learning Hardware

Original author: Tim Dettmers
  • Transfer
Deep learning is a process that requires large computing power. Of course, there is nothing good in spending money on the purchase of hardware from the cover of the magazine, which then will fly to the trash. It is necessary to approach this matter wisely.

Let's try to look at examples of hardware solutions related to the work on mastering the topic of deep learning. Well, let's touch on a little theory. / Photo by Berkeley Center for New Media CC Creating a deep learning system is definitely worth considering using GP



. The increase in productivity that he gives is too great to be ignored. Experts advise paying attention to such models as the GTX 680 (limited budget) and the GTX Titan X (expensive solution for convolution) or the GTX 980 (the best solution for the money).

To choose the right CPU , you need to understand how it relates to deep learning. Here he performs very few calculations, but still writes and reads variables, executes instructions, initializes calls to the GP and passes parameters to it.

Most deep learning libraries (like most applications) use only one thread. This means multi-core processors are useless. However, they may come in handy if you work with frameworks that use parallel computing, such as MPI.

The size of the CPU cache does not play a big role in exchanging data between the CPU and GPU. In deep learning, the data sets are usually too large to fit in the cache, and for each new mini-batch sample the data has to be read from memory - therefore, in any case, constant access to RAM is required.

Clock frequencynot always the best measure of performance in a given situation. Even when the CPU is 100% loaded, it may not be in its real work, but simply in cache errors (memory has a lower frequency than the CPU). The frequency of the processor determines the maximum frequency of RAM, and both of these frequencies determine the resulting bandwidth of the processor memory.

Bandwidth matters only when you copy large amounts of data. It determines how quickly a mini-batch can be rewritten and placed so that it can start initializing data transfer to the GPU.

With direct access to memory (RAP), the task will consist in the correct implementation of the transfer, which will allow cheaper and slower memory. This can be done asynchronously and without loss of performance. The frequency of RAM does not matter.

The size of RAM is very important here . It should be no less than the amount of memory the GPU. If there is a lot of memory, you can avoid a lot of problems, save time and increase productivity by tackling more complex issues.

/ Photo by UnknownNet Photography CC

Regarding Storagelarge volumes of data, everything is generally clear: most likely, you will store one part of it on a hard drive or SSD, the other part in RAM and the third part (two small samples) in the memory of the GPU. For example, if you are working with a large set of 32-bit data, then you definitely need an SSD, since hard drives with a speed of 100-150 Mb / s are too slow and will not keep up with the GPU. Many people buy SSDs just for convenience, but for deep learning it is necessary only if the dimension of the input data is large and you cannot compress it effectively.

Move on. Cooling. It will certainly affect performance to a greater extent than poor hardware does. During operation, modern GPUs increase their frequency and power consumption to a maximum, but as soon as the processor temperature reaches 80 ° C, the speed is reset to avoid exceeding the temperature threshold. This approach guarantees the best performance.

In the case of deep learning, the temperature threshold is reached in a few seconds after starting the algorithm. Therefore, the easiest and cheapest way to get around the restrictions is to install the BIOS firmware, which has new, more adequate fan settings that will keep the temperature and noise at an acceptable level.

Water cooling- Also an option. It will practically reduce the temperature of one GPU to a widow even at maximum load, and the temperature threshold will never be reached. Maintenance is effortless and should not be a problem. The only question is money.

The motherboard must have a sufficient number of PCIe ports so that all GPUs can be fixed on it. PCIe 2.0 is suitable for a single GPU, but PCIe 3.0 has better price / performance. Choosing a motherboard is generally a simple task: choose one that supports the hardware you need.

Large monitors that allow you to work productively and monitor the situation - the best investment. The difference between one and three monitors is huge, we advise you to pay attention to this.

Also popular now: