The neural network generates images of dishes according to their cooking recipes.
Comparison of real photos (above), generated images with semantic regularization (middle row) and without it
A group of researchers from Tel Aviv University has developed a neural network capable of generating images of dishes according to their textual recipes. Thus, the housewife can see in advance what will happen in the end, if you change one or another item of the recipe: add a new ingredient or remove some of the existing ones. In principle, this scientific work is a good idea for a commercial application, especially since the source code of the program is published in the public domain .
The neural network is a modified version of the Generative-Competitive Network (GAN) called StackGAN V2. The training took place on a large base of 52 thousand pairs of images / recipes from the recipe1M data set.
In principle, a neural network can take almost any list of ingredients and instructions — even fantastic combinations — and figure out what the finished product looks like.
"It all started with the fact that I asked her grandmother's recipe for the legendary fish cutlets with tomato sauce, - tellsOri Bar El, lead author of the scientific work. - Because of her old age, she did not remember the exact recipe. But I was wondering whether it is possible to build a system that will display a recipe from the image of food. After thinking about this problem, I came to the conclusion that the system is too difficult to obtain an exact recipe with real and "hidden" ingredients, such as salt, pepper, butter, flour, etc. Then I wondered if it was possible to do the opposite. Namely, generate product images based on recipes. We believe that this task is very difficult for people, especially for computers. Since most modern systems of artificial intelligence are trying to replace experts in simple tasks for humans, we thought that it would be interesting to solve a problem that even goes beyond human capabilities. As you see
Generating images from text is a complex task, with many applications in the field of computer vision. Recent work has shown that generative-competitive networks (GANs) are very effective in synthesizing high-quality realistic images from low-variability and low-resolution data sets.
It is also known that cGAN-type networks generate convincing images directly from the text description. Recently, a recipe1M data set containing 800,000 pairs of recipes and their corresponding images was published as part of a scientific study (see A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, A. the Torralba and. the Learning cross-modal embeddings for cooking the recipes and food images. with In the Proceedings of the IEEE Conference on the Computer the Vision and Pattern A Recognition, 2017). This set has high variability due to the variety of food categories by category. In addition, complex text from two sections (ingredients and instructions) is attached to images. In total, the text part can contain dozens of lines.
Having such an excellent set of data, scientists from Tel Aviv University could only teach a neural network. They combined the accumulated knowledge in the field of generative-adversary networks and the published data set.
Researchers recognize that the system is not perfect yet. The problem is that the initial data set is represented by images of relatively low resolution of 256 × 256 pixels, and often of poor quality, there are many images with poor lighting conditions, porridge-shaped images and non-square shaped images (which makes it difficult to train models). This fact explains why both the developed cGAN models succeeded in creating “mushy” foods (for example, pasta, rice, soups, salads), but it is very difficult for them to generate product images of a characteristic clear shape (for example, a hamburger or chicken).
In the future, the authors intend to continue work, having trained the system for the rest of the recipes (about 350 thousand images remain in the set of suitable data). However, this does not negate the fact that the available photos are of poor quality. Therefore, they admit the possibility of creating their own set based on the text of children's books and relevant images.
The scientific article was published on January 8, 2019 on the site of preprints arXiv.org (arXiv: 1901.02404).