DeepMind AI failed school math test



    Popular science and even entertainment media are now overwhelmed with news about the success of AI projects. Either artificial intelligence defeats a person in go, then learns to play StarCraft and emerges victorious from a battle with recognized champions. And this is only a small fraction of the achievements, in fact, there are much more. An ordinary person (in a sense, not related to the IT sphere) might think that a real, “big” artificial intelligence is about to appear, about which I write science fiction and make films.

    But everything is far from so rosy. For example, the other day there was information that the AI ​​tried to pass the test in higher mathematics (school test, standard for the UK) and could not do it.

    In principle, the causes of failure can be explained without much difficulty. So, a person in solving mathematical problems involves the following abilities and capabilities.

    Modifies for itself symbols in essence, such as numbers, arithmetic operators, variables (which form functions in a complex) and words (defining a question, the meaning of a task, etc.).

    • Planning (for example, ranking functions in the order necessary to solve a mathematical problem).
    • Using auxiliary algorithms for composing functions (addition, multiplication).
    • Using short-term memory to store intermediate values ​​(e.g. h (f (x))).
    • Putting into practice previously gained knowledge about rules, transformations, processes and axioms.

    DeepMind was trained and tested on a selection of different types of mathematical problems and problems. The developers did not use crowdsourcing; instead, they synthesized a data set to generate a large number of test tasks, control their complexity, etc. The development team used a “free form” text data format.

    Initial data was based on tasks from a selection of assignments for students in UK schools (under 16 years old). The tasks were taken from such areas as arithmetic, algebra, probability theory, etc.

    The DeepMind team, choosing the architecture of the neural network for solving mathematical problems, settled on LSTM ( long-term short-term memory ) and Transformer (the architecture of neural networks for working with sequences).

    DeepMind tested two LSTM models for working with math problems: a simple LSTM and Attentional LSTM whose operation scheme is shown in the figure below.



    Below is a diagram of the Transformer model.



    The result was not too good. Only 35% of the AI ​​answers were correct, this is an unsatisfactory assessment by the standards of any school.



    Of course, researchers from DeepMind have just begun work with mathematics and AI. In the future, greater success can be expected, as was the case with the same AlphaGo.

    The full study data can be found at this link .



    Also popular now: