Blog: Deep Learning: Background Research
Today, mainstream Machine Learning research is dominated by Deep Learning . “Deep learning researchers are like the pop idols in the AI community” .
What is Deep Learning?
Deep learning is a specific machine learning method  that, like most machine learning techniques, continually adjusts itself in an iterative manner  until it reaches a specific stopping point . “The idea of a computer which programs itself is very appealing” 
Deep learning models were born out of artificial neural networks;  a complex  interconnection of very simple processing nodes  organised into successive layers ; an input layer (where data is ingested), an output layer (where the predictions are output) and hidden layers (those in between the input and output layers) [4, 12, 13].
It was discovered that by adding more hidden layers to a neural network, hierarchical patterns could be learnt from the data that considerably improve the neural network’s predictive power.
Neural networks with more than two hidden layers  became known as deep neural networks (to distinguish them from less successful, past research involving shallower neural networks).
Many neural networks exist which are not counted as deep learning models (e.g. Perceptrons, Feed-Forward Neural Networks, Adaptive Neuro-Fuzzy Inference Systems, Extreme Learning Machines, Hierarchical Temporal Memory, etc). In fact, deep learning models need not be constrained to neural networks at all (e.g. Deep Kernel machines )!
What is Deep Reinforcement Learning?
Traditionally, deep learning models have been firmly rooted as supervised learning models, but there is growing research into deep learning models which can perform competitively using alternative types of machine learning too (e.g. deep reinforcement learning).
Unsupervised learning refers to machine learning models which can learn from unlabelled data [12,13], without any human intervention . For instance, a model may segment data into groups (clusters)  based on some patterns it finds in the data , which can then be used as labels so that the data can then be used to train a supervised learning model . Supervised learning refers to machine learning models that only learn from labelled examples  (the labels represent the desired behaviour that the model must learn ). Semi-supervised learning refers to machine learning models that learn from a mixture of labelled and unlabelled data. Reinforcement learning refers to machine learning models that do not learn from offline data but learn as they go instead; learning by trial and error  (a sequence of successful decisions will result in a process being reinforced since it best solves the problem at hand ).
Within each branch of machine learning, there exist many different approaches and machine learning models (of which some are deep learning models).
History of Deep Learning
Artificial Neural Networks
In this section, we take a look at some of the key events which have shaped the development of deep learning. In the 1940s, neurophysiologist Warren McCulloch and mathematician Walter Pitts  wrote a paper on how biological neurons might work . McCulloch and Pitts proceeded to test their theory and, in 1943 , the first artificial neural network was built using electrical circuits [2, 15].
In the 1950s, around the same time that Alan Turing published his AI paper on the Turing Test, computers became sophisticated enough to simulate a neural network . Frank Rosenblatt proposed an improved artificial neuron design in 1957 called the Perceptron , which gained widespread popularity for its simple and efficient learning algorithm .
The 1960s was the age of ADALINE (Adaptive Linear Elements) , another artificial neuron designed by Widrow and Hoff which used an alternative learning algorithm . When they combined Multiple ADALINE neurons (MADALINE) together, they created the first neural network that was used to solve a real-world problem  (an adaptive filter for removing echoes on phone lines ) and is still being used commercially to this day !
The AI Winter
The 1970s would become known as the AI Winter for the sudden disappointment and drop in interest with AI, which meant a drop in funding  and a drop in AI research , creating a temporary setback in the development of AI . Earlier successes had led to an exaggeration of AI’s potential  and inevitably promises went unfulfilled . The hype surrounding AI had also fuelled some ethical and philosophical questions in people’s minds which led to increasing fear and distrust for AI . Nevertheless, neural network research was not the mainstream AI paradigm at the time. A plethora of advancements in heuristic programming had overshadowed those of neural networks , making them less well known. Those who had heard of them did tended to prefer more conventional, linear, von neumann computational architectures (which eventually took over the computing scene ) since they required less memory and were less computationally intensive  than complex, distributed, parallel computational architectures like neural networks. The book “Perceptron” by Marvin Minsky and Seymour Papert demonstrated that neural networks (without any hidden layers) were unable to handle non-linearly separable data [2,3] and were limited in the types of problems they could solve. This signalled to AI researchers that neural networks appeared to be a theoretical dead end (and may have contributed to the factors triggering the AI winter ).
Although the world was not yet ready for neural networks, a small group of AI researchers, known as the “Parallel Distributed Processing” (PDP) research group , continued working on them without funding . They soon overcame the limitations outlined in the book “The Perceptron” by extending single-layered neural networks to multi-layered neural networks  and the first multi-layered neural network was actualised in 1975 . “Neural networks began competing with Support Vector Machines”  offering better results with the same amount of data . The book “Parallel Distributed Processing” by Rumelhard, McClelland and the PDP Research Group showed that multi-layered neural networks were far more robust and could be used to learn a vast array of functions .
The AI Come Back-Propogation
There remained one major hurdle preventing neural net research from taking off; the learning algorithms which worked so well for shallow neural networks (i.e. the Widrow-Hoff rule) did not scale well for deeper networks . In the 1980s, Rumelhart, Williams and Hinton applied backpropagation to multilayered neural networks as an efficient and scalable learning algorithm [1, 15]. It became the breakthrough that made deep learning efficient and quick enough to take off  and has since become the most widely known gradient descent algorithm . Interest in the field of AI was renewed  and AI research made a comeback with deep learning leading the way . When Japan announced a fifth generation effort on neural networks, the US worried they would be left behind in the field and, as a result, began pouring funding into the field .
Since then, deep learning has been “increasingly taking over AI tasks, ranging from language understanding, and speech and image recognition, machine translation, planning and even game playing and autonomous driving” . In the 1900s, Yan LeCunn demonstrated at Bell Labs the power of the backpropagation algorithm by training a deep neural network to read handwritten digits  and IBM’s DeepBlue beat chess champion gary Kasparov. In 1999 GPUs were developed for faster computing  but were prohibitively expensive and so, in the 2000s, backpropagation briefly fell out of favour.
Eventually, by 2010, GPUs became faster and more affordable and in 2011 IBM Watson won first place prize of $1 million on Jeopardy! In 2015, DeepMind used deep reinforcement learning to teach computers to play classic Atari games. In 2016 the error rate of automatic labelling of ImageNet declined from 28% to less than 3%, making it more accurate than humans (human error is approximately 5%). DeepMind’s AlphaGo beat world Go champion 4–1 and soon after AlphaGo Zero beat the original AlphaGo 100–0. In 2017, an AI system could classify skin cancer at a level comparable to dermatologists. Both IBM and Microsoft achieved human-level speech recognition for a limited domain. In 2018, Microsoft achieved human-level machine translation quality for chinese to english. OpenAI’s team of five neural networks defeated amateur human teams at the game Dota2. A Deep Learning system learnt to grade prostate cancer with superhuman accuracy (70% compared to the average human accuracy of 61%).