Google tells how the "exponential" growth of AI changes the very nature of computing
Google programmer Clif Young explains how the explosive development of depth learning algorithms coincides with the failure of Moore's law, the empirical rule of computer chip progress for decades, and forces us to develop fundamentally new computational schemes
The explosive development of AI and machine learning algorithms is changing the very nature of computing - as they say in one of the largest companies practicing AI - on Google. Google programmer Cliff Young spoke at the opening of the autumn microprocessor conference organized by the Linley Group, a popular symposium on computer chips held by a respected semiconductor analysis company.
Yang said that the use of AI had moved into the “exponential phase” at the very moment when Moore’s law, the empirical rule of progress for computer chips, had been completely slowed down for decades.
“The times are pretty nervous,” he said thoughtfully. “Digital CMOS is slowing down, we see problems with the 10nm process at Intel, we see them at the 7nm process from GlobalFoundries, and at the same time as the development of in-depth training, there is an economic inquiry.” CMOS, a complementary metal-oxide-semiconductor structure, is the most common material used to make computer chips.
While classic chips hardly increase efficiency and productivity, requests from AI researchers are growing, noted Young. He gave a bit of statistics: the number of machine learning papers stored on the arXiv preprint site maintained by Cornell University doubles every 18 months. And the number of internal projects focusing on AI in Google, he said, also doubles every 18 months. The need for the number of floating-point operations needed to process neural networks used in machine learning is growing even faster - it doubles every three and a half months.
All of this growth in computational queries is being combined into “Mura's super law,” said Young, and he called this phenomenon “a bit intimidating” and “a little dangerous,” and “in order to worry about.”
“Where did all this exponential growth come from?” In the field of AI, he asked. “In particular, the thing is that in-depth training just works. “In my career, I have long ignored machine learning,” he said. “It was not obvious that these things would take off.”
But then such breakthroughs quickly began to appear, such as pattern recognition, and it became clear that in-depth training was “incredibly effective,” he said. “For most of the last five years, we were a company that put AI in the first place, and we redid a large part of AI-based businesses,” from search to advertising and more.
The Google Brain project team, the leading AI research project, needs “giant machines,” said Young. For example, neural networks are sometimes measured by the number of “weights” used in them, that is, the variables applied to a neural network and affecting how it processes data.
And if ordinary neural networks can contain hundreds of thousands or even millions of scales that need to be calculated, researchers from Google require themselves “tera-weight machines”, that is, computers capable of counting trillions of scales. Because "every time we double the size of a neural network, we improve its accuracy." The rule of development of AI is becoming more and more.
In response to requests, Google is developing its own line of chips for MO, Tensor Processing Unit. TPU and its like are needed, since traditional CPU and GPU graphics chips do not cope with the loads.
"We held ourselves together for a very long time and said that Intel and Nvidia are great at creating high-performance systems," said Young. “But we crossed this line five years ago.”
TPU, after its first public appearance in 2017, caused a stir by claims that it outperforms conventional chips in speed. Google is already working on the third generation of TPU, using it in their projects and offering computer facilities on demand through the Google Cloud service.
The company continues to manufacture TPU of all large and large sizes. In its “stringed” configuration, the 1024 TPU is jointly connected to a new type of supercomputer, and Google plans to continue to expand this system, according to Young.
“We are creating giant multi-computers with tens of petabytes,” he said. “We are moving tirelessly on progress in several directions at the same time, and operations on a terabyte scale continue to grow.” Such projects raise all the problems associated with the development of supercomputers.
For example, Google engineers adopted the tricks used in the legendary Cray supercomputer. They combined a giant “matrix multiplication module”, a part of the chip that carries the main computational burden for neural networks, with a “general purpose vector module” and a “general purpose scalar module”, as was done in Cray. “The combination of scalar and vector modules allowed Cray to overtake all in performance,” he said.
Google has developed its own innovative arithmetic constructions for programming chips. A certain way of representing real numbers called bfloat16 provides an increase in efficiency when processing numbers in neural networks. In colloquial speech, it is called “brain float number”.
TPU uses the fastest memory chips, high-bandwidth memory, or HBM [high-bandwidth memory]. He said that the demand for large amounts of memory in the training of neural networks is growing rapidly.
“Memory during training is used more intensively. People talk about hundreds of millions of scales, but they also have their own problems when processing the activation of “variables of a neural network.
Google also tweaks a way of programming neural networks that helps squeeze the most out of hardware. “We are working on data and parallelism of the model” in such projects as “Mesh TensorFlow” - an adaptation of the TensorFlow software platform, “combining data and parallelism on pod scales”.
Young did not disclose some technical details. He noted that the company did not talk about internal connections, about how data travels around the chip - simply noted that "our connectors are gigantic." He refused to cover this subject, which caused laughter in the audience.
Young pointed to even more interesting areas of computation, which may soon be revealed to us. For example, he assumed that calculations using analog chips, circuits that process input data in the form of continuous values instead of zeros and ones can play an important role. "Perhaps, we will turn to the analog field, in physics there is a lot of interesting things related to analog computers and NVM memory."
He also expressed hope for the success of the startups associated with the chips presented at the conference: “There are very cool startups here, and we need them to work, because the capabilities of digital CMOS are not limitless; I want all these investments to shoot. ”