Blog: 10 basic data science and AI terms you need to understand
Is your data big enough to be called “big data”? Are you confused about how artificial intelligence and machine learning relate to each other?
We’ve collected some definitions to help you navigate through the maze of today’s tech buzzwords.
Data is at the very heart of the 4th industrial revolution happening today. Data is defined in the Cambridge Dictionary as “information, especially facts or numbers, collected to be examined and considered and used to help decision-making, or information in an electronic form that can be stored and used by a computer.” Data is at the core of most of today’s businesses and it is the most essential component of data science and artificial intelligence.
Data security, also known as information security, refers to the protective measurements that are applied to protect digital data from accidental or malicious corruption, destruction and unauthorised access. Encryption (where digital data is turned into a secret code), authentication (where data is accessible only through passwords of biometric data) and data masking (which makes a part of the data invisible to unauthorised viewers, e.g. covering certain digits of a credit card number) are a few examples of data security tools. Data security measurements can be applied to individual devices, databases, websites and entire organisations. Data breaches, where data is accessed by unauthorised individuals or groups have been plaguing databases around the world, causing a loss of consumer trust and they come at a great expense to the groups and companies affected.
How much data and what kind of data you need to have “big data” is determined by more than just the amount you have. The volume, variety and velocity all count. Volume can be measured in terabytes (one terabyte is one million million bytes), exabytes (one exabyte is one quintillion bytes) or yottabytes (one yottabyte is one septillion bytes.) Variety refers to the number of types of data and velocity to the speed of data processing. Occasionally additional V’s are also listed in the criteria: these are veracity, variability, validity, vulnerability, volatility, etc.
There is no one specific threshold required for data to be classified as “big”. One thing we know for sure is data just keeps on growing: the trend seems to be that the data available in the world is doubling every 3 years.
Data science, yet to be formally professionalised, is rapidly emerging as an interdisciplinary work area in which all activities associated to creating, handling interpreting connecting and communicating insights from is critical. Doing data science ‘right’ is also an essential component of artificial intelligence. Data is after all the fuel source feeding the machines and algorithms which increasingly make decisions all around us.
Artificial intelligence (AI)
According to Andrew Moore, the Dean of Computer Science at Carnegie Mellon University, “Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.”
Only a few decades ago tasks such as document spell checking, basic calculating or positioning oneself on the map would have required human brain power or actions, and a machine performing these tasks would have been sci-fi level artificial intelligence. No one these days would call them that now. Going forward, things we perceive as AI now might possibly be viewed as basic tasks for future computers and machines.
Artificial intelligence solutions are all around us these days, from the “smart devices” in our home to the algorithm based solutions of Netflix, Amazon, Uber and Lyft.
Augmented intelligence, sometimes referred to as intelligence augmentation or intelligence amplification is an alternative conceptualization of artificial intelligence. Augmented intelligence is not technically different from artificial intelligence, the term’s advocates argue that machines and technology are not to replace humans, but they are to increase humans’ potential and capabilities. While the term “artificial” suggests unnatural, the term “augmented” is the synonym of increased. The fears that surround artificial intelligence are often rooted in sci-fi works of the past and in hype from the press and politicians and have been for the most part unfounded.
Using the term augmented intelligence allows for a much more positive and optimistic view of how modern technology is affecting humanity.
One of the areas where the term augmented intelligence has been widely used (instead of artificial intelligence) is the legal profession.
“AI has largely been perceived as a threat to the legal profession. In fact, AI and analytics are helping attorneys become much more knowledgeable, efficient and productive than ever before. That said, we believe the industry will move away from the term, embracing “augmented intelligence” instead. More than just semantics, the shift reinforces the idea that technology exists to help legal professionals perform complex, data-intensive work more efficiently, not replace them.”
Machine learning, a subset of artificial intelligence helps create software that can change and improve its performance without the need for humans to explain to it how to accomplish tasks. The goal of machine learning algorithms is to develop programmes that access data and use that data to learn further. Machine learning requires a very large set of data, but at the same time it is able to analyse large data sets going forward. Both the quality and the quantity of the data are crucial during the learning process. Machine learning algorithms are currently widely used in many industries including in the medical field in diagnostics and in finance to make predictions of spending patterns and market movements.
Deep learning (or deep structured learning)
Deep learning, a subset of machine learning, is basically a crude imitation of how the human brain works. A machine processes and trains itself to process large data — e.g. images, sound samples, written text, etc. The inputs are categorised based on previous experience. For example, it can determine that the fed picture contains the face of a certain person, or that the small sound sample was the word “Hello.” The larger the data set fed into the algorithm, the more accurate the results will be. Image classification and facial recognition in photo apps are good examples for deep learning. Modern photo apps now recognise friends, family members and previous locations to allow quick searches and easy filing.
Computer vision (CV)
Computer vision is a multidisciplinary subfield of artificial intelligence and machine learning.
A field of computer science, computer vision’s goal is to “see”, identify and classify or process images in a way similar to how the human eye and brain perform this task.
An example for computer vision is Fujitsu’s new Judging Support System that makes it possible for computers to evaluate and score gymnasts’ routines in real time, without any human input during the process.
Natural language processing (NLP) and natural language generation (NLG)
Natural language processing combines aspects of computer science, linguistics and artificial intelligence. Its main objective is to have a computer, machine, or IoT device understand and interpret human language and turn it into data in order to perform a particular task. The input can be spoken language or text. IBM’s Translator Program in the 1960’s was one of the first significant uses of NLP, today’s everyday examples for voice input include Amazon’s Alexa, Apple’s Siri and Google Home where humans can command IoT devices to follow spoken commands.
Natural language generation has been often regarded as a subset of NLP, as it turns the process around while using the same components, but it has now developed into an area of its own right. It enables machines to convert data into written or spoken speech. Current uses include weather forecasting systems that convert weather data into written weather predictions, machine translation, that can happen between machines or people and machines (e.g. a written engine check warning in a car), chatbots and text authoring or summarisation.
Bogi Szalacsi is a Senior Associate with infoNation, based in London. You can contact her at email@example.com and follow her on Twitter: @infoNation5.