Blog

ProjectBlog: So, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds

Blog: So, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds


In this article I will demystify the term Artificial Intelligence, I will reveal where and how it is used. Lastly, using basic programming techniques I will provide a simple proof of concept that AI can be applicable in uncomplicated business processes. Do not fear, you do not have to be a tech guru to understand this article.

Artificial Intelligence (AI). It seems like every article on technology has to mention AI. At least once or twice. It’s associated with self — driving cars, Alexa from Amazon, and of course, it will steal your job. For better or worse, Artificial Intelligence became a buzzword. Though it seems like a lot of people do not truly understand what it means, so let me break it down for you. The term Artificial Intelligence (AI) is based on the idea that computers or systems can learn from data and identify patterns and make decisions with minimal human intervention. The method of learning from data analysis is defined as Machine Learning (ML). It automates analytical model building and is a subset of the Artificial Intelligence concept.

This article aims to provide a better understanding of what AI actually is. Today it plays an essential role in many industries. It’s used to facilitate human workforce to achieve more by spending less. For most of us, the term Artificial Intelligence associates with complex technology. The word intelligence relates to something or someone being smart and able to perform in a better way. And no one wants to be less smart than others, right? So, how exactly does AI contribute to the human workforce? For example, a variety of journals are using AI for news writing and distribution. It’s no surprise that Bloomberg is one of them. Just last year their program, Cyborg, churned out thousands of articles that took financial reports and turned them into news stories like a business reporter. The education system is using AI to prepare specialized learning programs. There are several companies such as Content Technologies and Carnegie Learning currently developing intelligent instruction design and digital platforms that use AI to provide learning, testing and feedback to students from pre-K to college level that gives them the challenges they are ready for, identifies gaps in knowledge and redirects to new topics when appropriate. It’s even expected that artificial intelligence in U.S. education will grow by 47.5% from 2017–2021 according to the Artificial Intelligence Market in the US Education Sector report. The few mentioned fields are just the beginning of the AI journey into the human workforce. In the upcoming years, AI will emerge in more and more industries changing the way we work entirely.

So, is it true that the terms Artificial Intelligence and Machine Learning are too hard for most of us to comprehend? To make things clearer let’s use an example where some concepts of Machine Learning can help a business achieve more by spending less.

Imagine a company that works in sales and consists of 50 sales representatives who work hard to contribute to the company’s revenue growth. They definitely want to be treated fair and earn a salary which corresponds to their results. The sales manager of this company works hard to treat all employees justly. Years of experience in the industry helped him develop a general key performance indicator for the sales teams. The manager thinks that this indicator not only helps in making better decisions but also provides an insight which employees are more experienced and therefore can guide colleagues with less experience. Also, he introduced revenue as a KPI, which is one of the most critical aspects that define the success of the whole company and can be directly related to employee salary. New customers onboard, their satisfaction score and deals made by sales representatives are other important factors which are necessary to include in defining successful sales representative performance.

To sum up, these were the five key performance indicators, which the sales manager defined:

  • Years of experience
  • Revenue
  • New customers (new brands)
  • Customer satisfaction score
  • Deals made

The sales manager created a spreadsheet where he listed all the yearly results of the employees. A few sleepless nights and countless energy drinks later, the manager collected all employee performance information from the company’s electronic journals and filled the sheet below (thanks, Greg). After filling out the values of the key performance measurements, he started going through each of the lines by setting a salary. Lastly, he revisited all the salary cells to make sure that every employee is aligned well in comparison with others.

To make this easier, let’s start by working out how the salary is depended on the revenue. We will use the Python programming language to train the algorithm.

Step 1 — Download and install tools

Let’s download Anakonda

The World’s Most Popular Python/R Data Science Platform

Step 2 — Import the libraries

import numpy as np # fundamental package for scientific computing with Python
import matplotlib.pyplot as plt # Python 2D plotting library
import pandas as pd # High-performance, easy-to-use data structures and data analysis tools for the Python
import requests as r # Requests is a Python HTTP library
import io # Python's main facilities for dealing with various types of input and output

Step 3 — Get the salary data from the web

RAW, Data Sheet

# these lines of code get the data from web url and fill the dataset 
content = r.get('https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv').content
dataset = pd.read_csv(io.StringIO(content.decode('utf-8')))

Step 4 — Select of the subset of data to train the algorithm

It’s quite similar to excel or other spreadsheet programs, where the data from columns and rows can be selected. To do this in Python we use the iloc[<rows>,<columns>] function. In the brackets, we set the number of rows and columns separated by a comma.

X = dataset.iloc[:, 2].values
y = dataset.iloc[:, 6].values

First, let’s use the revenue for algorithm input as X. To do that we have to select the third column by setting 2 in the iloc [:, 2] brackets and assigning it to the variable X.

Secondly, we use the salary as a result by assigning it to the variable Y.

The results of variables X and Y are presented in the table.

Step 5 — Splitting the data set into the Training set and the Test set

To train a computer to figure out what tendencies are used in creating the algorithm we have a set of 50 sales team salaries records. We also want to test whether a computer is as good at predicting as most humans are. To verify how well the algorithm performs we randomly take 10 per cent of the data records. We will use them to confirm or neglect our assumptions on algorithm performance in the future. We will use the other 90 per cent of the records to train the algorithm.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Step 5 — Fit Linear Regression to the Training set

To fit a linear regression to the training set, we will use the linear_model from the sklearn library. The fit method is used to train the model using the training sets.

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Step 6 — Predicting the Test set results

To predict the test set results we will use the predict method from the LinearRegression library by setting X_test data to anticipate Y_test results

y_pred = regressor.predict(X_test)

Step 7 — Visualizing the Training set results

To visualise the training set results, we will use the matplotlib.pyplot library. The blue line shows the salaries that would be predicted if we set a specific revenue. To put it simply, because the relation between the revenue and the predicted salary is linear the algorithm is called linear regression.

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Revenue (Training set)')
plt.xlabel('Revenue')
plt.ylabel('Salary')
plt.show()

Step 8 — Visualizing the Test set results

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Revenue (Training set)')
plt.xlabel('Revenue')
plt.ylabel('Salary')
plt.show()

The test set helps verify how well the algorithm performs. We use a sample of the data records from the sales manager test set to compare the algorithms prediction accuracy. The blue line represents what salary would be predicted on specified revenue. The red dots represent the real t salary which has been set by the sales manager. The shortest distance from the red dot and the blue line proves the ability of the algorithm to predict the salary with satisfying accuracy. The longer the distance between the line and the dot is the less accurate the prediction is.

The code

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import requests as r
import io

# Importing the dataset
content = r.get('https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv').content
dataset = pd.read_csv(io.StringIO(content.decode('utf-8')))
X = dataset.iloc[:, 2:3].values
y = dataset.iloc[:, 6].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
# Visualising the Trianing set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Revenue (Training set)')
plt.xlabel('Revenue')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Revenue (Training set)')
plt.xlabel('Revenue')
plt.ylabel('Salary')
plt.show()

Step 9 — Lets us multiple parameters

In step 4 as input parameters we used Revenue X and as an output parameter, we used the Salary y. Let’s expand the number of performance indicators and use all of the parameters which the sales manager uses.

To start using all of these parameters we need to adjust the code.

X = dataset.iloc[:, 1:5].values
y = dataset.iloc[:, 6].values

Now we assign the KPI’s data to the variable X from columns numbered 2 to 6 and predicted salary data from column 6 to variable y.

Basically from the results of the table we see that y_pred values 28902, 29287, 14927, 8770, 64167, 80742, 50027, 53469, 27705, 53827 correspond to the 10 percent randomly selected salaries from the sales manager data sheet 28485, 30112, 9275, 8216, 63365, 83383, 47949, 50149, 29659, 51282. We see that the algorithm predicted the salaries with a high accuracy percentage. The numbers are 1, 3, 38, 6, 1, 3, 4, 6, 7, 5. Only the salary of 9275 pops out from the trend with the accuracy of 38 per cent. In this case, we can conclude that the salary of 9275 EUR is not so well set and does not fit into the multiple linear regression trend line.

The code

# Multiple Linear Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import requests as r
import io

# Importing the dataset
content = r.get('https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv').content
dataset = pd.read_csv(io.StringIO(content.decode('utf-8')))
X = dataset.iloc[:, 1:6].values
y = dataset.iloc[:, 6].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)

Conclusion

Machine learning is an application of Artificial Intelligence (AI) that provides systems with the ability to learn and improve from experience without being explicitly programmed. The more data we have, the better results we can expect. Machine learning focuses on the development of computer programs that can access data and use that data to learn. In this article, we have analysed one of the main Machine Learning algorithms. Sales managers salaries history was used as data to train a simple multiple linear regression algorithm. This article proves that algorithms can not only predict the future results but also help us simulate the future, plan and make better decisions in such situations.

In this article, we have learned the principles of the linear regression algorithm which is one of the most used Machine Learning algorithms in the world of Artificial Intelligence. The algorithm itself is just a computer instructed to be able to use the data and apply some patterns to make our work easier. If specialists from different work fields would be able to understand the principles of Machine Learning algorithms, they could think of ways how these algorithms can contribute to the automation of daily business processes.

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
a

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.

Social