ProjectBlog: From Cups to Consciousness (Part 2): From simulation to the real world

Blog: From Cups to Consciousness (Part 2): From simulation to the real world

Go to the profile of MTank

A task interface for AI2Thor, physics simulations and a real robot named Vector

The moments of suspense as two teams of Vector robots (black vs multi-coloured) face each other before the “battle of the cup” commences. Created using the PyBullet physics simulator.
Round 1: Fight! 10 vs 10 Vectors. Cups fall mid-battle to cause carnage; a violent sport indeed. Scroll down for round 2 with 20 on each team!

At MTank, we work towards two goals:

(1) Model and distil knowledge within AI.

(2) Make progress towards creating truly intelligent machines.

As part of these efforts the MTank team release pieces about our work for people to enjoy and learn from for free. If you like our work, then please show your support by sharing it with other people who’ll like it too. Thanks in advance!

“If the brain creates a kind of perceptual radio program and uses that to orchestrate the behavior of the organism, what is listening? Rather than the universe itself, as some panpsychists believe, or some entity outside of the physical universe, as dualists claim, I’d like to suggest that conscious experience is a model of the contents of our attention: it is virtual, a component of the organism’s simulated self model, and produced by an attentional conductor”

Joscha Bach’s Cognitive Architecture in Phenomenal Experience and the Perceptual Binding State


In our last blog, we briefly covered some of the common thought experiments and debates over consciousness [link]. Again, we’d like to remind you that we intend to provide no answers to these deep questions about the nature of consciousness — we’re simply focused on picking up cups very well. World-class cup picking, with smatterings of what it all potentially means for humanity.

That being said, from the perspective of a materialist the world is only made up of objects and their relations. So let’s put the ‘object’ back in objectiveness, and get our agents into a variety of environments to begin attempting interesting tasks. Our budding consciousnesses — the agents — will have to start somewhere before any inkling of the subjective world begins to creep into their “self-model”.

In this installment we delve deeper into a few of the worlds our agent could live in — the places where our little fella will take his first steps. The great journey that his reward function shall guide him through — moving from his simulation into ours. We’ll touch our work on extending the capabilities of the amazing open-source 3D photo-realistic environment, AI2Thor, the steps taken to bring our work to more physically realistic simulations (PyBullet and Gazebo), and we’ll cover a simple and real robot called Vector, which can solve real world navigation problems.

Rejoice and let the cup picking begin!

A quick recap

In Part 1, we talked about project goals, language grounding and we surveyed some 3D environments. And to close, we talked in a bit of detail about the photorealistic Unity environment called AI2Thor. If you’re unfamiliar with the concepts, or you want to catch up on our musings, you can find part one here.

AI2Thor environment wrapper

Some argue that you can get to AGI purely through completing increasingly difficult and varied human tasks. Along this thinking, it’s important to be able to specify tasks both in simulation, and in the real world efficiently, and for us to be able to measure performance objectively for each. Since we want to do many tasks, we need a general interface to define a diverse set of tasks and make it customisable for all our needs.

Moreover, we want to help other people from the research community to train Reinforcement Learning (RL) algorithms within 3D environments, as well as enabling them to modify these environments for their specific goals.

Enter AI2Thor and our contributions:

  • We created a general structure for defining the environment setting and reproducible experiments within the AI2Thor environment.
  • An easy way to run general RL algorithms on our environment while following the OpenAI Gym environment interface (step() and reset()).
  • A task interface designed to allow for the faster creation of different reward functions and task definitions, the heart and soul of tasks e.g. “pick up the cup”.
  • Examples of policy and value-based algorithm baselines, with heavily commented explanations to help the user understand the details and maths behind the algorithms as well as the full process of how to run them on our environment. Keep an eye out for the next blog where we will explore this deeper.

Eventually, we hope to be able to benchmark different state-of-the-art (SOTA) algorithms from across the open-source world. We tried to keep as much of the original source code from these as possible, but also adapted and fine-tuned them to be suitable for the tasks we designed. But let’s move from abstract ideas to reality using the power of GIFs.

Note: Example task of “put the cup in the microwave” which can be made within our task interface.

Above we can see the execution of a simple task, ‘put the cup in the microwave’, within a simulated kitchen. We see that within AI2Thor, the home environments are complex and realistic, plus interaction with the environment is quite rich. In the images below, we see a bathroom scene, and an agent on the ground training to pick up cups that we’ve populated the floor with using the Unity game engine.

Note: An example of an RL agent training on our “NaturalLanguagePickUpMultipleObjectTask” in one of our customised scenes (edited with Unity).

A wrapper that simplifies task definition

Our wrapper ( attempts to make AI2Thor usable as an OpenAI gym environment by implementing the corresponding gym env sub-class. We also include a few examples of tasks, agent training, and scripts to simplify the learning process. We are constantly improving it to provide a better user experience, but this is iterative, so please bear with us.

Gym environments provide just the right amount of abstraction so that running your experiments can be done in a few lines of code like this:

This is very powerful if you want to use the environment “as is”, but we found it problematic when trying to make it customisable to many tasks while keeping this simple interface. Following gym recommendations, we should define an environment class as generic as possible and then inherit from one or several of these base classes to customise the environment for a specific task.

We thought this procedure might be over-complicated for our use cases, and we decided to take a different approach to make it more intuitive for us. We hoped that what made sense to us, might be easier to understand for the end user too. We believe this is more scalable when aiming for hundreds of tasks and innumerable “task variations”, but still wanting to keep the important, common parts of the environment, shared between all tasks.

Same interface, more control

The natural divide between base environment and the specific task within that environment led us to two main classes: ‘AI2ThorEnv’ and ‘BaseTask’. The goal here is to decouple our wrapper to AI2Thor from the details of the reward functions, and the specifics of resetting the environment for the specific task.

The former includes details of the scene, objects that will be taken into account, image details like resolution or grayscale format. The latter includes the initialisation/reset conditions necessary for that particular task as well as the computation of the rewards observed at each step.

Why did we do this?

Well this way the user can customise and create the task independent of the environment specifics. We achieve this by subclassing the base task to their goals, and relying on the underlying environment code to function and remain unchanged across tasks.

At the same time, we ensure that the user doesn’t have to divide his mind into thinking about environment and tasks independently for the experiment definition. We achieve this by using a single config file to specify the environment and task parameters.

As an example, if we wanted to change the example task given on the repository of a task to pick up cups, instead of creating a new subclass from the environment and modifying the step and reset functions (potentially adding large amounts of boilerplate and spaghetti code and maybe a few bugs) as is usually done within gym environments, we would create a new task class as follows:

And then easily modify the config file to change the environment and task conditions within seconds:

Change the task to pick up (and put down) apples as fast as possible instead

Task and config combinations allow “task variations”, e.g. PickUpTask allows you to specify “Apple” or “Cup” picking or both; each of these is the same task but a specific variation on it.

If all of this talk of wrappers, tasks and generalisable code interfaces got you excited, then don’t hesitate to try it out. Feel free to add an issue, or your own pull request, including new tasks so that you can compare your results with other people running the same kind of experiments. You can always drop us an email with any questions or for clarifications.

We hope that the GitHub README provides enough information, but do feel free to contact us as we reply pretty quickly. Getting feedback from you will help us make it more robust, and may enable even more people to use it for their own experiments and environment customisations at their convenience. So don’t be shy!

Simulated physics as the road to real world robots

Our phenomenal experience is very real to ourselves, but our selves are not real. In other words, when Tononi and Koch [from their paper on Integrated information theory (IIT) in 2015] argue that only a physical organism can have conscious experience, but a simulation cannot, they got it exactly backwards: a physical system cannot be conscious, only a simulation can.”

Joscha Bach’s Phenomenal Experience and the Perceptual Binding State

AI2Thor provides a diverse set of photo-realistic scenarios and a physics engine with simple manipulation through which we can test the generalisation capabilities of our agent; however we’re always hungry for more granular control and physical realism over every part of the environment.

For instance, the ability to completely specify our agent’s body: sensors, actuators, joints, links, etc. Plus we’d like to have these parts interact with the full nitty-gritty ugly physics that might cause our robot to fall or god forbid, in a very discouraging scenario, to even break a cup.

Hopefully this way, we can at least avoid breaking real-world cups.

This level of realism and control over the environment enables us to get much closer to eventually deploying these algorithms in real robots, in real homes. Giving us the ability to control appendages with surgical precision and improve object manipulations over time, e.g. robot fingers going through cup handles or using a spoon to delicately pour sugar.

But before this, let’s introduce the two newest team members perfectly suited to testing our early hypotheses: Vectors 00301ed1 and 00a10115.

Introducing Vector, the first of our robot guinea pigs

He might be small but he’s perfect for many of the general tasks and algorithms that we want to test. Anki, the robotics company, began with Cozmo and moved to Vector recently with their famous kickstarter campaign.

We just had to jump on the bandwagon because of its great python SDK and full power to run any python code from your desktop/laptop device, i.e. to receive state information from vector (image, 3D pose, battery, etc) and send the appropriate commands to the robot (move left wheel forward 50mm, go recharge).

This gives us the freedom to test many state-of-the-art algorithms within the real world, within Vector’s world, e.g. Simultaneous Localisation and Mapping (SLAM), active cup recognition, navigation to cups with Path Planning or pure end-to-end approaches with RL, etc.

Note: Full specification of Vector. It’s incredible the amount of things that can be fit on such a small and inexpensive robot; truly jam-packed.

Moreover, we are creating a ROS package which will allow us to manage independent control from the sensors, actuators and use robust libraries for navigation and perception problems, e.g. path planning and SLAM.

It is important to be able to simulate Vector as well. We found that Anki had generously shared their design files (in OBJ and MTL formats). As a bonus, we will share the corresponding URDF/SDF files we are creating in a future release so that you can play with vector even without buying him! These allow us to simulate the robot’s physical interaction within a simulation.

In Part 1 of this blog we covered many different 3D environments, with a focus on photo-realism among other characteristics, but few of these apart from AI2Thor had extensive physics. Even AI2Thor’s physics were somewhat limited for our needs. Understanding this, we knew we had to find a way to increase the physical realism of our simulations, e.g. by specifying forces, torques, joints, friction and inertia of detailed objects.

Curious as to what simulations are the absolute best with regards to physical accuracy, control and customisation? Well, after researching for a while, we ended up going with PyBullet (also easy to install like ai2thor, `pip install pybullet`) and Gazebo (which was designed for robots controlled with ROS in mind). Examples of how much power and control we get using these tools can be seen below with a simulated Vector in multiple different environments (using URDF/SDF files):

Physics simulations can literally bring us to the moon. Try to casually handle that edge case in the real world! Created with Gazebo and ROS.
GOOOOOOOOOOAAAAAAAAL! Created with Gazebo and ROS.
And now we’ll leave you with round 2! 20 vs 20 Vectors in PyBullet. At the end, one Vector got to run away with his cup into the far distance, clearly making this a win for the multi-coloured team.

Deconstructing the path to embodied AGI

We honestly don’t know where we will end up. We replan, refine and reprioritise our goals constantly and are willing to pivot fast into the most promising techniques that will increase the number of cups in the world that are picked up by our robots; our one true key performance indicator (KPI).

To maximise this KPI, we continuously debate about the many ways one can segment the problem of AI (“divide and conquer”), and which abstractions to choose in our agent’s design to help us take miniscule steps towards general machines (e.g. model-free vs model-based, PyTorch vs TensorFlow). For example, an interesting division is to split the hardware of a robot (or even a human) into groupings of sensors and actuators, both of which enable the software to perform perception and control respectively. Here are some reflections on this perspective and how we plan to use these abstractions:


In the case of human perception, the main senses are vision, touch, hearing, smell and taste. The last two would be an unusual starting point for our robot, while the first three seem to be the most important.

More specifically, sight is likely the most useful starting point. To realise “vision” in our agents, we’ll use our Computer Vision expertise to deploy techniques like object detection, depth estimation, SLAM and object pose recognition to enable our agents to learn about the world he lives in. Touch and hearing will come in time. Perhaps one day our agent will feel the warmth of a fresh cup of tea, or hear the kettle boil.


On the control side, for the cup-picking mission, we divide this into navigation and grasping. Our next few blogs will focus on the art of navigating to point locations and specific objects like cups. And after this, we’ll tackle grasping the cup, lifting it, and gently placing it in new locations. In each case we have more traditional techniques as well as more modern “learned” approaches at our disposal.

In the next instalment of C2C (#cuplife)

In our next blog, we’ll cover two SOTA RL algorithms that we ran on some tasks within the cups-rl repo. This should show readers how we intend to use it for benchmarking different algorithms and, of course —the finding, picking and hoarding of many many cups. An epic battle between policy and value-based methods within the model-free RL paradigm.

The two algorithms of choice are: A3C and RainbowDQN. Which one of these tough guys would you bet your money on? Follow along with our medium to find out!

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.