Blog

ProjectBlog: Generative Temporal Memory Models

Blog: Generative Temporal Memory Models


Instagram: @weevake

Go to the profile of Wiish

Disclaimer: This article assumes the reader has intermediate-to-advanced experience in machine learning, however all key terms are hyperlinked to resources for further reading.

I recently came across a DeepMind paper that presents generative temporal models with memory, in order to learn long-range dependencies based on temporally-distant, past observations. Given the current popularity of adversarial models, it made sense to take a few minutes to understand how memory-enabled generative adversarial nets (GANs) might perform.

To make this easier for you to understand and digest, I’ll break this into four sections:

  1. Summary
  2. Architecture
  3. Evaluation
  4. Strong & Weak Points (about the paper)
  5. Future Ideas & Discussion

Summary

“A GTM is a GAN designed in a variational inference framework with external memory augmentation”

This paper introduces a new paradigm of models called the Generative Temporal Models (GTMs), i.e., generative adversarial models with added external memory for tackling the problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A GTM then is basically a GAN designed in a variational inference framework with external memory augmentation.

Based on their ability to permit counterfactual reasoning, physical predictions, robot localisation, and simulation-based planning among other capacities, GTMs can solve variable-order Markovian problems, designed to store and protect information over long intervals.

Furthermore, their recurrent dynamics, with some variants represented in Fig. 1, serve two competing roles: 1) they preserve information in a stable state for later retrieval, and 2) they perform relevant computations to distill information for immediate use.

“The task is to infer the posterior distribution of the latent variables and learn the model parameters.”

These models explain a set of observations x ≤ T = {x1, x2, …, xT} with a set of corresponding latent variables z ≤ T ={z1, z2, …, zT} such that the joint distribution becomes

where θ are model parameters. The paper assumes that the prior distribution is Gaussian and that the likelihood function is any distribution appropriate for the observed data. Also, the models introduce a deterministic hidden-state variable h_t that is modified at every time step.

Based on the model presented above, the task is to infer the posterior distribution of the latent variables and learn the model parameters. The GTM model in the paper employs temporal variational autoencoders (VAEs) to rely on the output of an external memory system, which at every point in time is queried to produce a memory context Ψ_t. The prior and the posterior used become:

where

  • the prior is a diagonal Gaussian that depends on the memory context through the prior map f​_z,​
  • with a diagonal Gaussian approximate posterior that depends on the observation x​_t and the memory context Ψ​t−1 through a posterior map f_​q
  • Therefore, it has a generic and flexible architecture that allows any type of memory system to be used, allowing the remainder of the system to be unchanged since all dependencies are through the memory context Ψ​_t.​

Architecture

The architecture diagram is shown in Fig.2. External memory systems comprise two components:

  • an ​external memory M​_t,​ which stores latent variables z_​t,
  • a controller, which implements the addressing scheme that informs memory storage and retrieval.

Two addressing schemes are used by these models: ​1) content-based addressing ​accesses memories based on their similarity to a given cue, while ​2) position-based addressing ​accesses memories based on their position within the memory-store.

Content-Based Addressing Models

Memory: The memory for Neural Turing Machines and Differentiable Neural Computers is a generic storage component that allows information to be written to and read from any location.

Controller: The controller uses an LSTM network f_​rnn that updates the state-history h_t and the external memory M_t using the latent variables from the previous time step and any additional context information c_t on which the generative model is conditioned:

Based on cosine similarity measure, the controller generates a set of keys k​_tr​ for comparing a content-based read of R items from memory M​_t and M​_t-1,​ followed by a weighted sum of the retrieved memory output and attention weights and memory M_​t-1,​ according to:

Finally, the memory context, Ψ_t, that is passed to the generative model is the concatenation of the retrieved memories for each read-head and the controller state, Ψ​_t​ =[φ1_​t,​ … ,φR_​t ,h_ t] .

Evaluation

All models were trained by stochastic gradient descent on the variational lower bound using the Adam optimizer with a learning rate of 10e-3 and 10e-4 for sequences greater than 100 steps.

Mini-batches of 10 training sequences were used for computing gradients in all tasks. For tasks involving digits and characters, latent variables of size 32 and for the 3D environment, variables of size 256 were used.

The models were evaluated on the basis of the Kullback-Leibler divergence per frame, serving as a measure of prediction error between the prior and posterior. Each of the models learnt that there is repetition at frame 10, but the Introspective GTMM exhibited the lowest error. The models were also tested on tasks involving image-sequence modelling, offering tests of deduction, spatial reasoning, and one-shot generalisation.

Strong Points

  • Variational inference framework allows extreme data scalability, since it works very well and efficiently with very high data, observational, and parameter space dimensionality.
  • Variational inference allows effortless expandability with other models since it is based on a gradient-based learning system, just as other standard deep learning models.
  • Introspective GTMs can be applied to several kinds of applications since they are fast, train quickly, and effectively handle temporally-extended dependencies.

Weak Points

  • GTMs inherently scale poorly when higher capacity storage is required.
  • GTMs are typically densely connected, so the parametric complexity of the model can grow quadratically with the memory capacity.
  • In the attention lookup component of GTMs, the memory is updated dynamically, as opposed to being updated all at once. This may slow down performance, since the memory slots that have already been updated might still have to wait for the subsequent slots to be updated.

Future Ideas & Suggestions

  • Compared to position-based addressing used by Introspective GTMs, the content-based addressing technique is a better choice as it allows scaling to a much greater extent due to its memory structure, i.e., it does not have the first-in-first-out buffer restriction that the Introspective GTMs (based on position-based addressing) have.
  • The parametric complexity in the observational and parameter space of the model typically increases quadratically with increasing memory size of GTMs. One way to tackle this could be to ensure that the memory slots are reusable. This way, a contiguous block of memory slots needn’t be added each time dynamically. Instead, the same slots can be flushed out and reused.
  • GTMs dynamically update the memory of the attention lookup component. But, this would result in slower performance. Instead, a lookup hash table could be used so that memory slots that are updated later in time do not hamper the slots already updated and consequently affect the performance of the entire model. Also because, fundamentally, hash tables are a very efficient data structure.

So, there you have it! I know it’s a long and dense read and you might even take some time to really get a hang of this paper, but given the effectiveness of models with externally augmented memory, I think GTMs might be a smart trick for future applications, despite their current (aforementioned) drawbacks.

Please let me know what you think and feel free to leave any comments or suggestions!

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
a

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.

Social