Blog: Strong AI, how far are we?
Deep learning is undoubtedly a breakthrough in field of artificial intelligence. It is the first ray of sunshine breaks two decades of AI dark age.
The fundamental basis of deep neural networks are Universal approximation theorem and Back-propagation. While people are still excited about (or scared of) the break through, Geoffrey Hinton has recently expressed his deep suspicious to the method and said “My view is throw it all away and start again.” Clearly the pioneer and the spiritual leader of deep learning feel unease about the situation.
Universal approximation theorem basically means any continuous functions F(x) in a bounded space can be approximated by applying a non-linear transformation (activation function φ) upon linear combinations and linear transformations of ‘x’ — yes, the famous φ(∑wx+b).
Another way to think about deep learning is this. Assuming dogs and cats are distinguishable (I’m using a classification task as an example, but regression task essentially is the same), there must be a hyper-plane divides them in a high dimensional feature space. However, the space is crumpled and folded in its raw data representation.
Non-linear activation function φ helps to unfold the feature space while the hidden layers in a feed-forward network increases (by feature combinations) or decreases (by dropping irreverent features) the space dimensions. The gradient descent based Back-propagation works as a crystal ball which guides us eventually find the separation.
The conventional way of programming is rule based. It reflects human logic and reasoning. However, our reasoning model can only take a limited number of parameters. In a task such as image recognition, many pixels contribute to the result that is non-describable in specific rules. In this regard, deep learning greatly expands the computer capability. To be fair, deep learning is not the first data driven self-fitting method, but it distinguishes from others for its great capacity and expressiveness.
But based on deep learning, can we develop strong AI that equals to or exceeds human’s intellectual capability? Today deep learning is overwhelmingly focusing on applied AI to solve specific tasks. The performances in different fields are unbalanced. In general, it works better on self-contained and homogeneous data (e.g. images, board games) than open scoped and heterogeneous data (e.g. natural languages, autonomous driving).
Despite the common belief, deep learning has limited capacity. If universal approximation theorem is so great, shouldn’t dense layer is sufficient for everything? Why do we need different architectures? Because when the input dimension increases, the potential linear combinations grow exponentially. Going deeper enforces a hierarchical structure which partially mitigates the issue. But neural network cannot go too deep either due to the vanishing gradient issue. In addition, the training samples are not always unlimited. Network architectures such as CNN, RNN or Transformer selectively join data to confine the training space according to feature characteristics. They are no more than a general way of feature engineering.
Deep learning is not true intelligence, and this shouldn’t be a surprise. Neural network is modeled on human biological nervous system. However, human is never distinguished from other mammals on biological level. We have 96 percent of our DNA sequence identical to a chimpanzee. In about 300 thousands years ago, early homo sapiens have evolved the brain biologically identical to modern humans. But not until 50 thousands years ago, human has separated ourselves from other beings by showing true intelligence. One hypothesis is that a minor genetic change in our vocal system caused human capable to make rich voices which originated speech and languages. This encouraged human to express ideas in a symbolic way and eventually developed our civilization.
Human cognitive process is a combination of subconscious perception and conscious thinking. On the low level our vision works mostly without reasoning. Thatcher illusion is an interesting effect first reported by the University of York’s Professor Peter Thompson in 1980. When the face is inverted, it is more difficult to detect some local features despite being obvious in upright direction.
This is often interpreted as our face perception is special which relies on specific psychological cognitive modules tuned to upright faces. But in my opinion, it only proves our cognitive system is really trained by samples. Therefore, it has higher precision on more familiar scenes. Here I demonstrate a similar effect by not using faces.
On the high level, our mind quickly associates all the aspects of detected objects and predicts possible consequences. When we look at the picture such as below, people’s identity, their background, the location, what are they thinking etc., we use much past knowledge outside of the picture to deduce all likelihoods. Our world is deeply connected, this is where deep learning struggles. Real intelligence will not be born from isolated training.
To my belief, abstract thinking and rationality are the key characters of human intelligence. They are very likely based on association mechanism. But human symbolic behaviors greatly boosted the ability. When we communicate in symbolic messages, many details are omitted but it allows us to greatly generalize and simplify the matter which eventually to be able to go deeper. Symbolic behavior also creates symbolic reality allowing us to accumulate knowledge generation by generation. It is the human’s answer to the scale of the real-world complexity.
I envision the strong AI would emerge in a similar way. To be able to abstract concepts, to deduce and to induce by reasoning would be a clear sign of strong AI. It requires a representation that can associate concepts in different relationships such as similarity, contrast, spatial or causal. The representation may not be necessary symbolic but in numerical embedding. However, for the sake of human, it would be necessary to be manifested in human languages at least briefly, so that machine and human can understand each other.