Blog: What Is Artificial Intelligence?
‘Artificial intelligence’ has been one of the most hyped and misunderstood terms since it was originally coined in 1956. Over the decades as technologies evolved, the common understanding of what ‘artificial intelligence’ is has itself evolved — embracing promising new approaches and dismissing older approaches that have run their course.
In the current era, ‘artificial intelligence’ generally refers to non-deterministic data models that are derived from algorithms capable of learning about a specific reality, primarily through an approach known as supervised training, which will be described herein. The self-learning algorithmic architectures currently being used to accomplish this are known collectively as ‘deep neural networks’ within the field of ‘deep learning’, which has become synonymous with ‘artificial intelligence’ in recent years.
The learning process is accomplished by providing large quantities of data to the algorithm to be trained, and the data itself must have a contextual relationship with the use-case being modeled — both as features (inputs), and as known results (outputs).
The features present in each data instance are what define its contextual relationship with the use-case. The features must be explicitly specified to the learning algorithm by annotating them with metadata — known as a ‘label’. The label is what gives each data instance meaning to the algorithm that will evaluate it.
Known results are the other indispensable requirement for a labeled data set to be used for training. For each ‘row of labeled data instances’ that is evaluated by the algorithm, there must be a known result.
To be usable by the learning algorithm in the training process, every ‘row of labeled data instances’ must have each of its constituent data instances labeled appropriately and must be associated with a known result.
The entirety of a collection of labeled data instances — all with known results — represents a proxy for the reality to be modeled.
To learn a given specific reality from a collection of relevant labeled data instances, an appropriate deep neural network architecture is selected, and initialized with random values. Then training commences on the algorithmic architecture selected, in the hope that an accurate model of the specified reality may be achieved.
Training consists of passing one ‘row of labeled data instances’ at a time through the deep neural network — each row representing one synchronized instance of the specified reality — to generate a resulting prediction about that reality. That prediction is compared with the known result for that ‘row of labeled data instances’, yielding an error between the known result and the predicted result. That error is then evaluated by an algorithm within the deep neural network known as the optimizer, which makes adjustments to the internal values within the developing model in an attempt to reduce the error in subsequent training iterations.
In the next iteration, the learning algorithm goes through the same process with the next ‘row of labeled data instances’, concluding once more with its attempt to reduce subsequent errors. And on and on many times — potentially for many millions of iterations — each one known as an ‘epoch’.
In early training epochs, the resulting predications will be inaccurate, which is expected since the untrained model was originally initiated with random values. And yet, each epoch should be thought of as a learning opportunity, since it will conclude with its attempt to reduce subsequent error.
The training of the deep neural network model concludes once it achieves an acceptable prediction error, or when the algorithm’s ability to learn from the data set is no longer yielding a reduced prediction error — whether that current prediction error is acceptable or not.
This supervised training approach, which enables deep neural networks to learn from data, inverts the traditional programming model.
The traditional model assumes the human programmer can encode all rules by which data is processed, to yield a resulting answer. It is fundamentally flawed in a highly-scaled world, since the potential number of rules to encode for highly-complex use cases can theoretically approach infinity.
The artificial intelligence approach, using supervised training, instead takes a large set of data instances with known results, and attempts to discover the complex set of rules that govern the use case. It’s an approach that scales well for a highly-complex world, which is why neural computing — artificial intelligence — is growing exponentially for complex use-cases in nearly every industry on the planet.
The ability to scale nearly infinitely within a narrow scope is why deep neural networks can master complex tasks better than even the most-skilled human beings. Over the course of their lives, humans who achieve mastery in a specific skill may learn from tens or hundreds of thousands of examples. But a deep neural network, trained on a highly-relevant labeled data set that includes many millions of examples with known results, has a learning advantage that no human can match.
That is why artificial intelligence has become the most important strategic technology advantage since the rise of the Internet.
You can find Chris Benson on Twitter, LinkedIn, or his website.