
Reducing dependency on tagged data in generative-adversarial networks
- Transfer
Generative Adversarial Networks (GAN) - A class of deep generative models with interesting features. Their main idea is to train two neural networks, a generator that learns the synthesis of data (for example, images), and a discriminator that learns how to distinguish real data from those synthesized by the generator. This approach has been successfully used for high-quality image synthesis , improved image compression , and more.

Evolution of generated samples during training on ImageNet. The generator is limited by the class of the image (for example, “bearded owl” or “golden retriever”).
In the field of synthesis of natural images, conditional GSS achieve the best results., which, unlike the unconditional ones, use labels ("machine", "dog", etc.) during training. And although this simplifies the task and provides a significant improvement in the result, such an approach requires a large amount of tagged data, which is rarely found in practice.
In our work “Generating high-quality images with fewer tags”, we propose a new approach to reduce the amount of tagged data needed for training advanced conditional CSS. Combining this approach with recent breakthroughs in the development of large-scale GSSs, we produce comparable-quality natural images using 10 times fewer tags. We also release a large update to the Compare GAN library. based on this study, which contains all the necessary components for training and evaluation of modern GSS.
In conditional GSS, the generator and discriminator are usually limited to class labels. In our work, we propose to replace manually affixed tags with the supposed ones. In order to display good quality labels for a large set consisting of mostly untagged data, we use a two-step approach. First, we learn how to present image features only with the example of the unallocated part of the database. To learn about the presentation of signs, we use self-supervision in the form of a recently proposed approach in which unlabeled data is randomly mixed, and a deep convolutional neural network predicts the angle of rotation. The idea is that models should be able to recognize basic objects and their shapes in order to successfully complete this task:

Then we consider the activation sequence of one of the intermediate layers of the trained network as a new representation of the characteristics of the input data, and train the classifier to recognize the label of these input data using the labeled part of the initial data set. Since the network was preliminarily trained to extract semantically meaningful data attributes (in the task with the prediction of rotation), training this classifier is more efficient by examples than training the entire network from scratch. Finally, we use this classifier to mark up unallocated data.
To further improve the quality of the model and the stability of training, we encourage the discriminator network to learn meaningful representations of attributes that are not forgotten during training due to the auxiliary losses we presented earlier. These two advantages, along with large-scale training, provide advanced conditional GSSs that are well suited for synthesizing images from ImageNet, judging by the Fréchet distance .

The generator network produces an image based on an eigenvector. In each row, linear interpolation of the eigencodes of the leftmost and rightmost pictures leads to semantic interpolation in the image space.
Advanced research in the field of GSS strongly depends on well-developed and tested code, since even the reproduction of previous results and techniques requires a lot of effort. To support open science and allow the research community to build on recent breakthroughs, we are releasing a large update to the Compare GAN library. It includes loss functions, regularization and normalization schemes, neural network architecture, and numerical metrics, often used in modern GSS. She also already supports:
Given the gap between labeled and unlabeled data sources, it is becoming increasingly important to learn from only partially labeled data. We have shown that a simple but powerful combination of self-supervision and semi-surveillance can help bridge this gap for GSS. We believe that self-supervision is a promising idea that needs to be explored for other areas of generative modeling.

Evolution of generated samples during training on ImageNet. The generator is limited by the class of the image (for example, “bearded owl” or “golden retriever”).
In the field of synthesis of natural images, conditional GSS achieve the best results., which, unlike the unconditional ones, use labels ("machine", "dog", etc.) during training. And although this simplifies the task and provides a significant improvement in the result, such an approach requires a large amount of tagged data, which is rarely found in practice.
In our work “Generating high-quality images with fewer tags”, we propose a new approach to reduce the amount of tagged data needed for training advanced conditional CSS. Combining this approach with recent breakthroughs in the development of large-scale GSSs, we produce comparable-quality natural images using 10 times fewer tags. We also release a large update to the Compare GAN library. based on this study, which contains all the necessary components for training and evaluation of modern GSS.
Improvements through semi-supervision and self-supervision
In conditional GSS, the generator and discriminator are usually limited to class labels. In our work, we propose to replace manually affixed tags with the supposed ones. In order to display good quality labels for a large set consisting of mostly untagged data, we use a two-step approach. First, we learn how to present image features only with the example of the unallocated part of the database. To learn about the presentation of signs, we use self-supervision in the form of a recently proposed approach in which unlabeled data is randomly mixed, and a deep convolutional neural network predicts the angle of rotation. The idea is that models should be able to recognize basic objects and their shapes in order to successfully complete this task:

Then we consider the activation sequence of one of the intermediate layers of the trained network as a new representation of the characteristics of the input data, and train the classifier to recognize the label of these input data using the labeled part of the initial data set. Since the network was preliminarily trained to extract semantically meaningful data attributes (in the task with the prediction of rotation), training this classifier is more efficient by examples than training the entire network from scratch. Finally, we use this classifier to mark up unallocated data.
To further improve the quality of the model and the stability of training, we encourage the discriminator network to learn meaningful representations of attributes that are not forgotten during training due to the auxiliary losses we presented earlier. These two advantages, along with large-scale training, provide advanced conditional GSSs that are well suited for synthesizing images from ImageNet, judging by the Fréchet distance .

The generator network produces an image based on an eigenvector. In each row, linear interpolation of the eigencodes of the leftmost and rightmost pictures leads to semantic interpolation in the image space.
Compare GAN library for training and evaluation of GSS
Advanced research in the field of GSS strongly depends on well-developed and tested code, since even the reproduction of previous results and techniques requires a lot of effort. To support open science and allow the research community to build on recent breakthroughs, we are releasing a large update to the Compare GAN library. It includes loss functions, regularization and normalization schemes, neural network architecture, and numerical metrics, often used in modern GSS. She also already supports:
- Training on GPU and TPU.
- Easy setup with Gin ( examples ).
- A huge number of data sets through the TensorFlow library .
Conclusion and plans for the future
Given the gap between labeled and unlabeled data sources, it is becoming increasingly important to learn from only partially labeled data. We have shown that a simple but powerful combination of self-supervision and semi-surveillance can help bridge this gap for GSS. We believe that self-supervision is a promising idea that needs to be explored for other areas of generative modeling.