Blog: This eye does not exist — Part 2/2 — Generating the dataset from unlabeled image data
All the images shown in this website are generated by a computer without user input. The machine learning algorithm…thiseyedoesnotexist.com
Since I had zero experience with generative adversarial networks, I thought I should document some problems I had to overcome.
Quoting Wikipedia: “A generative adversarial network (GAN) is a class of machine learning systems. Two neural networks contest with each other in a zero-sum game framework. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics. It is a form of unsupervised learning.”
I’m not doing any introduction about how a GAN works since there are a lot of materials online with far better insights than the ones I could give. I actually think the original paper by Ian J. Goodfellow et al. is very good at explaining it.
To make thiseyedoesnotexist I had to generate a dataset suitable for generative adversarial training. This meant I had to find a way to sort thousands of unlabeled images using an automatic unsupervised method. I didn’t know that in the beginning, but I discovered in the process. This story is a recipe for that procedure.
I like to use and create unique datasets so I can develop my intuition the fullest.
This was my first plan:
- gather images about makeup related stuff
- train a DCGAN on the dataset
This adventure started with the gathering of 200k publicly available images related to makeup. There are multiple methods to do this and I will leave that to your imagination.
Here is a sample from the images gathered,
I happily followed the tutorial on the Pytorch website, regarding the DCGAN implementation and started the training using a 1080ti.
When the training finished I was shocked with how bad the results were. I actually run the thing quite some time trying to understand what went wrong.
The fake images actually got worse with more epochs of training,
The generator and discriminator loss could confirmed this,
At this moment I realized I was lacking some intuition about GANs and their problems so I started reading a lot about them. Eventually I understood that the distribution I was trying to model was too rich and that GANs suffer from something called mode collapse.
Since I had already trained a DCGAN I had the following thought:
What if I use the discriminator as a feature extractor to try to separate the images into classes? The discriminator must have built some sort similarity measure on the last layer. If I use a subset of images, all very similar to each other, I might achieve better results.
I devised a second plan:
Since I already had trained the DCGAN and I wanted to make the distribution easier to learn I made the following plan:
- Use the features generated by the last layer of the DCGAN discriminator
- Reduce every image feature representation to 50 dimensions using PCA
- Make a t-sne of top of that (reduced it further to two dimension)
- Use one of the newer GAN architectures such as ProGAN
After converting all the images to their “last layer discriminator representation”, reduced them to 50 components using PCA and applied t-sne to a 2 dimensional representation, the plot I got was the following:
At this point I felt really excited. I quickly used Kmeans to understand what was in each of the clusters. Each of the image grid corresponds to images sampled from the annotated red cluster(s):
These two clusters were almost always eye makeup pictures!
This one was related to lips!
This cluster was mostly related to random makeup products
While I was hearing Ian Goodfellow talk, I discovered that this procedure is somewhat documented in academia: https://youtu.be/Z6rxFNMGdn0?t=2564
Now that I had a set of similar images I decided to train a GAN on them.
- Choose a cluster of images (eye makeup)
- Train a ProGAN on it!
In case you want to contact me,