In this tutorial, we will build an artificial neural network with python just using the Numpy library.
While we create this neural network we will move on step by step.
But you can use any programming language to create this neural network too.
Describe The Network Structure
The artificial neural network that we will build consists of three inputs and eight rows.
But we will use only six-row and the rest of the rows will be test data.
We will build an aritificial neural network that has a hidden layer, an output layer.
The Hidden layer will consist of five neurons. Weights and bias of the neural network will be created randomly.
Define the Variables
In this part, we will define the variables that we use.
The variables will consist of the matrices.
Therefore we can simply define these matrices using the python Numpy library.
Firstly we need to install Numpy library and we can install the Numpy library using the following command.
pip install numpy
Defining inputs and output :
#Defining inputs and output
import numpy as np
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
print("inputs : ")
print(inputs)
print(".....................")
print("output : ")
print(output)
"""
inputs :
[[0 0 0]
[0 0 1]
[0 1 0]
[0 1 1]
[1 0 0]
[1 0 1]]
.....................
output :
[[0]
[1]
[0]
[1]
[0]
[1]]
"""
Defining weights :
#Defining weights
w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
print("w1")
print(w1)
print("..................")
print("w2")
print(w2)
"""
w1
[[-0.77428492 0.10287792 0.69741541 -0.95775929 0.50670826]
[ 0.80597415 -1.80287916 -0.3028974 -0.67473998 0.56049986]
[-0.81735613 2.33120242 1.07024909 -0.66092605 1.46534629]]
..................
w2
[[-2.42478319]
[ 0.68046428]
[ 0.33132679]
[-0.16557226]
[ 0.59223169]]
"""
Defining biases:
b1 = 1
b2 = 1
Activation Function
The activation functions allow us to bring our output values in each layer to the 0-1 range.
If we do our operations without using the activation function, the output values will increase exponetially after each process. This will cause us to train our artificial neural network very long periods of time.
There are many activation functions to eliminate this.
These are sigoid function, tanh function, relu function etc.
We well use the sigmoid function in this tutorial.
#Defining Sigmoid Function
def sigmoid(x):
return 1/(1 + np.exp(-x))
Build the Artificial Neural Network
Artificial neural network training consists of two main parts.
- Calculating the predicted output ŷ, known as feedforward
- Updating the weights and biases, known as backpropagation
The following imange neural network shows training.
Feedforward
z = x * w + b
a = 1 / (1 + e^ - z)
y_ head = a2
error = loss function
error = (1 / 2) * (output - y_head)^2
#Feed Forward
z1 = np.dot(inputs, w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
error = np.sum((1/2)*(output - a2)**2)
print("z1 : ", z1.shape) #z1 : (6, 5)
print("a1 : ", a1.shape) #a1 : (6, 5)
print("z2 : ", z2.shape) #z2 : (6, 1)
print("a2 : ", a2.shape) #a1 : (6, 1
print("error : ", error) #error : 1.3296527834297855
BackPropagation
So far, we found the error value of our prediction.
But we need to update our weights and bias values.
So we need to know the derivative of the loss function with respect to the weights and biases.
We know that derivative of the function is the slope of the function.
If we can calculate the derivative of the function we can simply update weifhts and bias value by increasing/reducing.
This called gradient descent.
However, we can't directly calculate the derivative fo the loss function with respect to the weights and biases.
Because the derivative of the loss function doesn't contain weights and bias.
So we need to use the chain rule to calculate it.
Each part of the chain rule is called the partial derivative.
Step 1: Find derivatives of layer one.
#Step_1
error_d_a2 = (a2 - output)
# derivative of the a2 with respect to the z2
a2_d_z2 = a2 * (1 - a2)
# derivative of the z2 with respect to the w2
z2_d_w2 = a1
# derivative of the z2 with respect to the b2_w
z2_d_b2 = b2
print("error_d_a2 : ", error_d_a2.shape) #error_d_a2 : (6, 1)
print("a2_d_z2 : ", a2_d_z2.shape) #a2_d_z2 : (6, 1)
print("z2_d_w2 : ", z2_d_w2.shape) #z2_d_w2 : (6, 5)
print("z2_d_b2 : ", z2_d_b2) #z2_d_b2 : 1
Step 2: Calculate the derivative of the error with respect to the w2.
#Step_2
delta_w2 = error_d_a2 * a2_d_z2
delta_w2 = np.dot(z2_d_w2.T, delta_w2)
print("delta_w2 : ", delta_w2.shape) #delta_w2 : (5, 1)
Step 3: Calculate the derivative of the error with respect to the b2.
#Step_3
delta_b2 = error_d_a2 * a2_d_z2
delta_b2 = delta_b2 * z2_d_b2
delta_b2 = np.sum(delta_b2)
print("delta_b2 : ", delta_b2) #delta_b2 : -0.08251552717143652
Step 4: Update w2 and b2_w.
#Step_4
w2 = w2 - delta_w2
b2 = b2 - delta_b2
print("new w2 : ")
print(w2)
print(".....................")
print("new b2_w : ")
print(b2)
"""
new w2 :
[[-0.3107736 ]
[-0.26243013]
[-1.62597648]
[-0.460874 ]
[ 0.32223855]]
.....................
new b2_w :
1.1307639633992455
"""
Step 5: Find derivatives of layer two.
#Step_5
# derivative of the z2 with respect to the a1
z2_d_a1 = w2
# derivative of the a1 with respect to the z1
a1_d_z1 = a1*(1 - a1)
# derivative of the z1 with respect to the w1
z1_d_w1 = inputs
# derivative of the z1 with respect to the b1_w
z1_d_b1_w = b1
print("z2_d_a1 : ", z2_d_a1.shape) #z2_d_a1 : (5, 1)
print("a1_d_z1 : ", a1_d_z1.shape) #a1_d_z1 : (6, 5)
print("z1_d_w1 : ", z1_d_w1.shape) #z1_d_w1 : (6, 3)
print("z1_d_b1_w : ", z1_d_b1_w) #z1_d_b1_w : 1
Step 6: Calculate the derivative of the error with respect to the w2.
#Step_6
delta_w1 = error_d_a2 * a2_d_z2
delta_w1 = np.dot(delta_w1,z2_d_a1.T)
delta_w1 = delta_w1 * a1_d_z1
delta_w1 = np.dot(inputs.T,delta_w1)
print("delta_w1 : ", delta_w1.shape) #delta_w1 : (3, 5)
Step 7 : Calcluate the derivative of the error with respect to the b1.
#Step_7
delta_b1 = error_d_a2 * a2_d_z2
delta_b1 = np.dot(delta_b1,z2_d_a1.T)
delta_b1 = delta_b1 * a1_d_z1
delta_b1 = delta_b1 * z1_d_b1_w
delta_b1 = np.sum(delta_b1)
print("delta_b1: ", delta_b1) #delta_b1: 0.046485859867826225
Step 8: Update w1 and b1.
#Step_8
w1 = w1 - delta_w1
b1 = b1 - delta_b1
print("new w1 : ")
print(w1)
print(".....................")
print("new b1 : ")
print(b1)
"""
new w1 :
[[ 0.23161175 1.36976182 1.61154306 0.84420228 2.84234378]
[ 0.25536816 2.39003472 -0.59019456 -1.43970131 -1.50262859]
[-0.58270371 1.48691414 -1.06496844 0.34346572 -0.69036972]]
.....................
new b1 :
1.0566873975726727
"""
Training the neural network
so far, we calculated all parameters and now we will train our neural network.
# Training the neural network
w1 = w1 - delta_w1
b1 = b1 - delta_b1
error_list.append(error)
print("error : ", error) #error : 0.9380097470416288
Show the Trained Data With Matplotlib Library
We trained our artificial neural network times 100.
Now let's show trained data on the screen using the python matplotlib library.
import matplotlib.pyplot as plt
x = np.arange(len(error_list))
y = error_list
plt.figure(figsize=(10,8))
plt.plot(x,y)
plt.xlabel("iteration")
plt.ylabel("error")
plt.title("Artificial Neural Networks Training")
plt.show()
# Defining all variables
import numpy as np
error_list = list()
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
# Weights
w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
# Biases
b1 = 1
b2 = 1
# Sigmoid Function
def sigmoid(x):
return 1/(1 + np.exp(-x))
# Update the weights times 100.
for i in range(100):
# Feedforward
z1 = np.dot(inputs,w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
error = np.sum((1/2)*(output - a2)**2)
# Backpropagation
## LAYER 2
### derivative of the error with respect to the a2
error_d_a2 = (a2 - output)
### derivative of the a2 with respect to the z2
a2_d_z2 = a2*(1 - a2)
### derivative of the z2 with respect to the w2
z2_d_w2 = a1
### derivative of the z2 with respect to the b2_w
z2_d_b2 = b2
### delta weights 2
delta_w2 = error_d_a2 * a2_d_z2
delta_w2 = np.dot(z2_d_w2.T, delta_w2)
### delta biases
delta_b2 = error_d_a2 * a2_d_z2
delta_b2 = delta_b2 * z2_d_b2
delta_b2 = np.sum(delta_b2)
### Update weights and bias
w2 = w2 - delta_w2
b2 = b2 - delta_b2
## LAYER 1
### derivative of the z2 with respect to the a1
z2_d_a1 = w2
### derivative of the a1 with respect to the z1
a1_d_z1 = a1*(1 - a1)
### derivative of the z1 with respect to the w1
z1_d_w1 = inputs
### derivative of the z1 with respect to the b1_w
z1_d_b1_w = b1
### delta weights 1
delta_w1 = error_d_a2 * a2_d_z2
delta_w1 = np.dot(delta_w1,z2_d_a1.T)
delta_w1 = delta_w1 * a1_d_z1
delta_w1 = np.dot(inputs.T,delta_w1)
### delta bias 1
delta_b1 = error_d_a2 * a2_d_z2
delta_b1 = np.dot(delta_b1,z2_d_a1.T)
delta_b1 = delta_b1 * a1_d_z1
delta_b1 = delta_b1 * z1_d_b1_w
delta_b1 = np.sum(delta_b1)
### update w1 and b1
w1 = w1 - delta_w1
b1 = b1 - delta_b1
error_list.append(error)
print("error : ", error)
'Deep Learning' 카테고리의 다른 글
TensorFlow (0) | 2022.09.12 |
---|---|
Numpy (0) | 2022.09.12 |
How to Create Single Layer Neural Network with Python? (0) | 2022.09.12 |
How to create A single Layer Perceptron? (0) | 2022.09.12 |
Linear Functions (0) | 2022.09.11 |