Blog: How to apply learning tricks to Deep Convolutional Generative Adversarial Network in Keras
In this article, I would like to introduce some learning tricks for DCGAN through Keras code implementation. Training these networks is a difficult process. Two different networks are trained simultaneously, and in the meantime, a lot of other factors have to be considered.
GAN topology is discussed in several sources. In these networks, a generator creates real-like samples which are then evaluated by a discriminator. Based on the result of the evaluation, the generator can adjust its parameters. If you are interested to know more about GANs, I linked some sources at the end of this article.
Several articles and papers suggest different training methods to increase training efficiency and final model quality. During the development and training of my model, I could not find a guide, where the often mentioned hacks are discussed. I was looking to find a proper guide that helps me implement my model, without success. Thus, I decided to write it myself and help my fellow ML enthusiasts.
So here are the hacks I’ve found and used for my model, with nice results.
Noise shall be added to the labels. So instead of 0 and 1, it is better to use 0.0…0.1 and 0.9…1.0. This causes some skepticism in the discriminator model because the ground truth is not always the same. Thus, some adjustment is always needed to keep up with the training. Here is my implementation in numPy:
valid = np.random.uniform(0.9, 1.0, (halfBatch,1))
fake = np.random.uniform(0.0, 0.1, (halfBatch,1))
To my mind, uniform distribution is better than a normal one, since, in the case of the first, the discriminator is not going to fixate around 0.05 and 0.95 but rather the whole spectrum. The halfBatch variable is needed to train the discriminator and generator equally (half-half batch for real and generated images for the discriminator and a full batch for the generator).
Since the labels are not 0 and 1, the regular binary_crossentropy accuracy will not give proper results, so a new accuracy function shall be applied. Simply modifying the original binary_accuracy function is enough. Casting K.round to the y_true will result in an integer tensor, so accuracy will be interpretable.
def threshold_binary_accuracy(y_true, y_pred):
return K.mean(K.equal(K.round(y_true), K.round(y_pred)),axis=-1)
# Later when compiling the model, just add it to the metrics list
self.D.compile(loss = 'binary_crossentropy', optimizer = optimizer, metrics = [threshold_binary_accuracy])
In practice, label flipping tends to work great. The purpose of this trick is to help the generator fool the discriminator.
So, do the labeling like 0 = Real, 1 = Fake instead of the other way around. Furthermore, when you are training the generator, the discriminator shall assume that the fed images are real images, so label 0 is used in this case (the previous hack is still applied here, add noise to the labels). Finally, swapping 5% of the labels to the opposite leads to more stable training. Here is my implementation:
# Create a mask with 95% False
mask = np.random.choice(2,(halfBatch,1), p=[0.95,0.05]).astype(bool)
# Swap the label arrays based on the mask
valid[mask] = fake[mask]
fake[mask] = temp[mask]
# Train the discriminator
d_loss_real = self.D.train_on_batch(real_imgs_batch, fake)
d_loss_fake = self.D.train_on_batch(gen_imgs_batch, valid)
When training the discriminator, feed the generated and real images separately. Additionally, the same swapping and label flipping shall be applied for the training of the generator:
# Create the mask
mask = np.random.choice(2,(batch_size,1),p=[0.95,0.05]).astype(bool)
# Create the labels and with the mask, swap them
valid_for_gen = np.random.uniform(0.0, 0.1, (batch_size,1))
valid_for_gen[mask] = np.random.uniform(0.9, 1.0, len(valid_for_gen[mask]))
# Train the generator, the generated images are labeled real
g_loss = self.GAN.train_on_batch(noiseForCombined, valid_for_gen)
The noiseForCombined is the noise generated for the generator training with the proper batch size.
You do not need a very complex model
I fell into the pit of the model with too many parameters. The lesser the better, unless the generator will overfit and only generate one type of face. Exact numbers are hard to define, but when your model is always generating the same face, you might need to look at the trainable parameter number.
Here are 10 generated images from a model with 40.000.000+ parameters after 40.000 epochs (I used 100.000 images from the CelebA dataset):
As you can see they have quite similar features.
After reducing the parameter size of the model to just 4.000.000, I got these results after 40.000 epochs:
The faces are kind of goofy, but they lack similar features.
Also adding Dropout layers to the generator helped with the overfitting, but first make sure you moderate your model size.
Transposed Convolution is the way to go when upsampling
You can image transposed convolution by mapping 1 pixel to 9. This way the size of the image can be increased when the stride of the convolution is set to 2. The following example creates a layer with 64 filter size, 5×5 kernel size and (2,2) stride:
model.add(Conv2DTranspose(64, 5, 2, padding='same'))
It depends on the project that how many upsampling layers are needed. For me, only one such layer did the trick and even the “checkboard” effect was moderately present.
A lot of information about the checkboard effect is present in this article. For me using a simple UpSampling did not help, because the model was not evolving, but after adding a simple convolutional layer after the transposed convolutional layer, the results were pretty good.
My takeaways from this project
It was super fun to create a generative model, which in the end could generate a face-like image. If I were to do it again, I would define the core configurations in the very beginning of the project – i. e. the generated image shall be 64×64 with 3 channels (colored). This would have helped me avoid several redesigning and dimension problems.
Sometimes the training fails for several epochs and the generated images stay the same. Do not stop training, because sometimes the model just needs some time to get back on the good track. But if you see from the metrics that something is out of the ordinary, maybe the training failed. Clues can be very high, close to zero or stagnating losses.
Do not change too many things between iterations of the model architecture. You will not know what changes actually resulted in performance or quality improvements.
Finally, the best metrics are the generated pictures. It is a good idea to plot and save some of the images, which were fed to the discriminator in order to check the direction of learning.
Learning about a new concept and trying to implement it is always a great challenge. Thank you for reading my journey of developing a face generating model. I will provide some links below to the articles that I found helpful and used during this project.
Also, If you are interested in the whole code, you can find it on my GitHub. It will always include the model I had the best results with.