Blog: Cameras Don’t See — Mimicking the Human Eye in Software
Say “so long” to people with a clipboard. The next generation camera has been built to mimic the human eye and provide spatial intelligence about the data that the camera captures.
Curiosity is the best part of being “human”. This curiosity and creativeness is a software vendor’s worst nightmare. Just because a person can “dream it”, doesn’t mean that software can implement it.
This is especially true when working with video. Computer vision specialist Everett Berry and CEO of Perceive along with Aaron Michaux, CTO of Perceive, helped me understand that most overhead cameras don’t capture exactly what the human eye sees. I had previously assumed that video cameras provide an exact recording of what I see. This is not the case. A security camera, in-store camera, shelf camera only captures a portion of what our eyes see, but the real magic is how the brain turns that into a rich 3D visual scene. The lack of 3D spatial reasoning — the “mind’s view” is why current in-store analytics and people counters are limited in the accuracy and type of analytics that they can provide.
Cameras, like the retina, see what’s called a “projective space”.
This space is heavily distorted, and even allows us to see things infinitely far away — that point on the horizon where receding train tracks meet. Seeing the infinite is… rather distorted. The magic of the human visual system is that it takes this 2D information, and reconstructs a 3D world in the mind’s eye. This illusion is so obvious and immediate that its complexities and importance were completely missed until 1960s robotics scientists attempted to process projective images.
The 3D world is where we think. It’s how we reason about space. When we think of people moving through space, it’s not a projective space, but a 3D space. At Perceive, we’re building an integrated computer vision system that understands that space, and has figured out how to extract data from that space: “turning that space into data.”
Let’s look at Store Camera basics: Camera either have 1 lens or 2 lenses.
1 lens camera:
- Most security or overhead cameras are 1 lens camera
- The further away a camera is placed from the scene, the fewer details that are captured.
- 1 lens cameras capture a limited point of view, and it is currently difficult to reconstruct 3D information
- An overhead camera takes of top/down approach and can only capture the images that are in direct line of sight.
2 lens camera:
- Capable of capturing coarse 3D data within a field of view.
- Does not require top-down placement for accurate people counting.
- Can only capture the images that are in direct line of sight — that is, several 2 lens cameras generally aren’t integrated into a single camera network.
- Lacks integrated software stack to save, process, and extract information.
Next Generation Camera: Perceive Camera Network
- Cloud-based, wireless, utilizes light fixture for power
- The camera is able to utilize a 5G chip
- Multiple camera systems are integrated into a network to produce a single queryable “view” of a store, workplace, or area.
- Integrated computer vision software that can mimic the 3D reasoning of the human mind
- Next generation computer vision gives feedback on demographics, and on which way a person is facing without invading their privacy as well as other behaviors
- Maintains customer privacy because Perceive doesn’t use facial recognition.
- Gives stores, museums, universities, malls, workplaces the best action and demographic data about their space
- Better for the environment as it doesn’t need a battery.
Computer vision software
The next area for boundary-breaking technology is the ability to mimic the human visual system. This field is called “computer vision”, is considered one of the toughest areas of artificial intelligence. Computer vision researchers have an injoke: vision is “AI complete”. A play of words that means that human-like artificial vision will be achieved after all other problems in AI has been solved. But progress has been made, and we’re nearing a suite of vision technologies that will drive future economic growth.
Sight is difficult to replicate. Sight is a combination of capture and interpreting information like perception. A camera captures the 3-dimensional information and then use computer vision algorithms to stitch together all of those 3D images. The stitched together images are then fed into AI/machine learning/deep learning algorithms. The result is analytics about the people in the space. This is called spatial intelligence
Can 2D images be converted to 3D with today’s technology?
Although far from the abilities of the human mind, artificial vision works surprisingly well in restricted applications. Vision technology is moving quickly — spurred by improvements in both algorithms and hardware. The state of the art is a moving target. Most computer vision research focuses on the 2D projective space of images (i.e., retinal information), eschewing 3D as “too hard”. However, top computer vision researchers, like Geoffey Hinton (known for popularizing deep learning), have recognized the importance of 3D processing. A renaissance in 3D vision is underway, with fantastically useful results, for example with self-driving cars.
Cutting edge research is showing that 3D vision is possible with new hardware and new techniques, and this dramatically simplifies other computer vision tasks, like precisely where is that pedestrian, what direction are they traveling, and how fast.
These questions are impossible to answer accurately from 2D projective images alone. 3D reasoning is necessary. Every year we learn more about our vision system and spatial reasoning. Applying those lessons to computer vision software will be the next great leap in camera technology.
-Karen Salay, Serial Entrepreneur and Perceive Advisor