{ "cells": [ { "cell_type": "markdown", "id": "48940f9b", "metadata": {}, "source": [ "# The goal is to show that the use of backpropagation can capture properties of the input in the hidden layer that are not explicitly represented by the input." ] }, { "cell_type": "markdown", "id": "5b707007", "metadata": {}, "source": [ "The use of less hidden units than input units imposes a constraint on the problem and forces the neural network to rerepresent the input units. " ] }, { "cell_type": "code", "execution_count": 1, "id": "1460a1c9", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "import pandas as pd\n", "from sklearn import preprocessing" ] }, { "cell_type": "markdown", "id": "579ff56c", "metadata": {}, "source": [ "This neural network's purpose is to learn the target function f(x) where f(x) is a vector which contains seven 0's and a 1. \n", "This has been represented below using a pandas dataframe where there are 8 rows which represent the 8 different vector combinations that can make up f(x)." ] }, { "cell_type": "code", "execution_count": 2, "id": "bba80b47", "metadata": {}, "outputs": [], "source": [ "index = [\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\"]\n", "vectors = [\"v_1\", \"v_2\", \"v_3\", \"v_4\", \"v_5\", \"v_6\", \"v_7\", \"v_8\"]\n", "\n", "data = np.array([[1, 0, 0, 0, 0, 0, 0 , 0],\n", " [0, 1, 0, 0, 0, 0, 0 , 0],\n", " [0, 0, 1, 0, 0, 0, 0 , 0],\n", " [0, 0, 0, 1, 0, 0, 0 , 0],\n", " [0, 0, 0, 0, 1, 0, 0 , 0],\n", " [0, 0, 0, 0, 0, 1, 0 , 0],\n", " [0, 0, 0, 0, 0, 0, 1 , 0],\n", " [0, 0, 0, 0, 0, 0, 0 , 1]])" ] }, { "cell_type": "code", "execution_count": 3, "id": "862be7a8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
12345678
v_110000000
v_201000000
v_300100000
v_400010000
v_500001000
v_600000100
v_700000010
v_800000001
\n", "
" ], "text/plain": [ " 1 2 3 4 5 6 7 8\n", "v_1 1 0 0 0 0 0 0 0\n", "v_2 0 1 0 0 0 0 0 0\n", "v_3 0 0 1 0 0 0 0 0\n", "v_4 0 0 0 1 0 0 0 0\n", "v_5 0 0 0 0 1 0 0 0\n", "v_6 0 0 0 0 0 1 0 0\n", "v_7 0 0 0 0 0 0 1 0\n", "v_8 0 0 0 0 0 0 0 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(data=data, index=vectors, columns=index)\n", "df" ] }, { "cell_type": "markdown", "id": "b9d4761e", "metadata": {}, "source": [ "The data is split into X and Y where X is the input that we want the neural network to use to predict the output values in Y." ] }, { "cell_type": "markdown", "id": "76093341", "metadata": {}, "source": [ "Normally X is taken to be a subset of the total number of columns avaliable and Y is chosen to be the value that we wish to predict. For example, X could be the number of bedrooms and square footage in a house and Y could be the value of that house. We use the info in X to predict the value of Y and then we can compare this Y to the values in our training sample.\n" ] }, { "cell_type": "markdown", "id": "5849466c", "metadata": {}, "source": [ "However as we want the input data and output data to be identical, the values of X and Y are equal." ] }, { "cell_type": "code", "execution_count": 4, "id": "a219d2a6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(8, 8)\n", "(8, 8)\n" ] } ], "source": [ "# split into X and Y\n", "Y = df\n", "X = df\n", "\n", "print(X.shape)\n", "print(Y.shape)\n", "\n", "# convert to numpy arrays\n", "X = np.array(X)" ] }, { "cell_type": "markdown", "id": "ba4e8971", "metadata": {}, "source": [ "The sizes of the input, hidden and output layers are defined to be 8,3 and 8 as the purpose of this exercise is to have a smaller hidden layer such that the 3 hidden layers will be forced to represent the 8 input layers." ] }, { "cell_type": "code", "execution_count": 5, "id": "a6150bf9", "metadata": {}, "outputs": [], "source": [ "input_unit = X.shape[0]\n", "hidden_unit = 3 \n", "output_unit = Y.shape[0] " ] }, { "cell_type": "markdown", "id": "64980b2f", "metadata": {}, "source": [ "Inital values of W1 and W2 are created using a small random value. The 2 bias weights are initalised as zero vectors." ] }, { "cell_type": "code", "execution_count": 6, "id": "83ac1ef2", "metadata": {}, "outputs": [], "source": [ "def parameters_initialization(input_unit, hidden_unit, output_unit):\n", " np.random.seed(2) \n", " W1 = np.random.randn(hidden_unit, input_unit)*0.1\n", " b1 = np.zeros((hidden_unit, 1))\n", " W2 = np.random.randn(output_unit, hidden_unit)*0.1\n", " b2 = np.zeros((output_unit, 1))\n", " parameters = {\"W1\": W1,\n", " \"b1\": b1,\n", " \"W2\": W2,\n", " \"b2\": b2}\n", " \n", " return parameters" ] }, { "cell_type": "markdown", "id": "519ecc99", "metadata": {}, "source": [ "The activation functions used in neural networks are Sigmoid, tanh, Softmax, ReLU, Leaky ReLU. A different activation fucntion can be chosen for each layer of the neural network." ] }, { "cell_type": "markdown", "id": "cc16dfc6", "metadata": {}, "source": [ "T.Michell does not specify which activation function that was used to obtain the hidden values found in the diagram and thus two arbitrary activation functions are used. The tanh function is used for the hidden layer while the sigmoid function is used for the output layer. " ] }, { "cell_type": "markdown", "id": "9727ac7a", "metadata": {}, "source": [ "In forward propagation, the input data is fed through the neural network in a forward direction and computes the predicted error." ] }, { "cell_type": "code", "execution_count": 7, "id": "95216155", "metadata": {}, "outputs": [], "source": [ "def sigmoid(z):\n", " return 1/(1+np.exp(-z))\n", "def forward_propagation(X, parameters):\n", " W1 = parameters['W1']\n", " b1 = parameters['b1']\n", " W2 = parameters['W2']\n", " b2 = parameters['b2']\n", " \n", " Z1 = np.dot(W1, X) + b1\n", " A1 = np.tanh(Z1)\n", " Z2 = np.dot(W2, A1) + b2\n", " A2 = sigmoid(Z2)\n", " cache = {\"Z1\": Z1,\"A1\": A1,\"Z2\": Z2,\"A2\": A2}\n", " \n", " return A2, cache" ] }, { "cell_type": "markdown", "id": "032119a1", "metadata": {}, "source": [ "For back propagation, the partial derivatives of the error function E are computed. This allows the weights to be adjusted to increase the accuracy of the neural network. The gradient descent technique is used which is discussed below." ] }, { "cell_type": "markdown", "id": "4614fa4e", "metadata": {}, "source": [ "Note: In class, we focused on these partial derivatives using the sigmoid function for both the hidden layer and the output layer. However below, this changes for dW1 and db1 as we are now using the tanh function as the activation function for the hidden layer. " ] }, { "cell_type": "code", "execution_count": 8, "id": "93170907", "metadata": {}, "outputs": [], "source": [ "def backward_propagation(parameters, cache, X, Y):\n", " m = X.shape[1]\n", " \n", " W1 = parameters['W1']\n", " W2 = parameters['W2']\n", " A1 = cache['A1']\n", " A2 = cache['A2']\n", " \n", " dZ2 = A2-Y\n", " dW2 = (1/m) * np.dot(dZ2, A1.T)\n", " dZ2 = np.array(dZ2)\n", " db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)\n", " dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))\n", " dZ1 = np.array(dZ1)\n", " dW1 = (1/m) * np.dot(dZ1, X.T) \n", " db1 = (1/m)*np.sum(dZ1, axis=1, keepdims=True)\n", " \n", " grads = {\"dW1\": dW1, \"db1\": db1, \"dW2\": dW2,\"db2\": db2}\n", " \n", " return grads" ] }, { "cell_type": "markdown", "id": "4a2dc8d2", "metadata": {}, "source": [ "The parameters are updated through each iteration using the gradient descent method. The batch method is implemeted below where each value is iterated through, this is chosen as the neural network and number of iterations is small. The learning rate that is chosen is 0.3 to keep with the values used in 'Machine Learning' by T.Mitchell. " ] }, { "cell_type": "code", "execution_count": 9, "id": "48b521a0", "metadata": {}, "outputs": [], "source": [ "def gradient_descent(parameters, grads, learning_rate):\n", " W1 = parameters['W1']\n", " b1 = parameters['b1']\n", " W2 = parameters['W2']\n", " b2 = parameters['b2']\n", " \n", " dW1 = grads['dW1']\n", " db1 = grads['db1']\n", " dW2 = grads['dW2']\n", " db2 = grads['db2']\n", " W1 = W1 - learning_rate * dW1\n", " b1 = b1 - learning_rate * db1\n", " W2 = W2 - learning_rate * dW2\n", " b2 = b2 - learning_rate * db2\n", " \n", " parameters = {\"W1\": W1, \"b1\": b1,\"W2\": W2,\"b2\": b2}\n", " \n", " return parameters" ] }, { "cell_type": "markdown", "id": "f80aa2dc", "metadata": {}, "source": [ "The neural network is ran for 5000 iterations by applying forward progation followed by backward propagation followed by gradient descent." ] }, { "cell_type": "code", "execution_count": 10, "id": "72bc1d0d", "metadata": {}, "outputs": [], "source": [ "def neural_network_model(X, Y, hidden_unit, num_iterations):\n", " np.random.seed(32)\n", "\n", " parameters = parameters_initialization(input_unit, hidden_unit, output_unit)\n", " \n", " W1 = parameters['W1']\n", " b1 = parameters['b1']\n", " W2 = parameters['W2']\n", " b2 = parameters['b2']\n", " \n", " for i in range(0, num_iterations):\n", " A2, cache = forward_propagation(X, parameters)\n", " grads = backward_propagation(parameters, cache, X, Y)\n", " parameters = gradient_descent(parameters, grads, 0.3)\n", " return parameters\n", "parameters = neural_network_model(X, Y, 3, 5000)" ] }, { "cell_type": "markdown", "id": "1d419ef4", "metadata": {}, "source": [ "Using the final values for the parameters W1,b1,W2,b2 and the input values X, a value for Y is predicted." ] }, { "cell_type": "code", "execution_count": 11, "id": "1d039bae", "metadata": {}, "outputs": [], "source": [ "def prediction(parameters, X):\n", " A2, cache = forward_propagation(X, parameters)\n", " predictions = np.round(A2)\n", " \n", " return predictions" ] }, { "cell_type": "code", "execution_count": 12, "id": "85bc84f7", "metadata": {}, "outputs": [], "source": [ "A2, cache = forward_propagation(X, parameters)" ] }, { "cell_type": "markdown", "id": "8d8dd4b3", "metadata": {}, "source": [ "Looking at the values of the predicted Y, it is clear by comparision that this equals the real output Y." ] }, { "cell_type": "code", "execution_count": 13, "id": "4820f628", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The predicted values for Y are \n", " [[1. 0. 0. 0. 0. 0. 0. 0.]\n", " [0. 1. 0. 0. 0. 0. 0. 0.]\n", " [0. 0. 1. 0. 0. 0. 0. 0.]\n", " [0. 0. 0. 1. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 1. 0. 0. 0.]\n", " [0. 0. 0. 0. 0. 1. 0. 0.]\n", " [0. 0. 0. 0. 0. 0. 1. 0.]\n", " [0. 0. 0. 0. 0. 0. 0. 1.]]\n", "The true values for Y are \n", " 1 2 3 4 5 6 7 8\n", "v_1 1 0 0 0 0 0 0 0\n", "v_2 0 1 0 0 0 0 0 0\n", "v_3 0 0 1 0 0 0 0 0\n", "v_4 0 0 0 1 0 0 0 0\n", "v_5 0 0 0 0 1 0 0 0\n", "v_6 0 0 0 0 0 1 0 0\n", "v_7 0 0 0 0 0 0 1 0\n", "v_8 0 0 0 0 0 0 0 1\n" ] } ], "source": [ "predictions = prediction(parameters, X)\n", "print('The predicted values for Y are \\n' , predictions)\n", "print('The true values for Y are \\n' , Y)" ] }, { "cell_type": "code", "execution_count": 14, "id": "b112bb95", "metadata": {}, "outputs": [], "source": [ "w1 = parameters['W1']\n", "b1 = parameters['b1']\n", "w2 = parameters['W2']\n", "b2 = parameters['b2']" ] }, { "cell_type": "markdown", "id": "2faae651", "metadata": {}, "source": [ "The hidden values are shown below, as tanh was used as the activation function, the range of values are from -1 to 1. These are then normalised and rounded to the range of 0 to 1. Looking back, the sigmoid activation function would have been more suitable to this problem as it outputs out vaues from 0 to 1 thus no normalisation would have been required. " ] }, { "cell_type": "code", "execution_count": 15, "id": "7bec756a", "metadata": {}, "outputs": [], "source": [ "hidden_values = cache['A1'].T" ] }, { "cell_type": "code", "execution_count": 16, "id": "706dcbd8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.99490787, -0.04302783, -0.92818738],\n", " [-0.90009014, -0.93967381, 0.98884459],\n", " [ 0.02941738, 0.99386195, -0.94039297],\n", " [ 0.98758792, 0.97646282, 0.94002666],\n", " [-0.06999129, -0.99436434, -0.91497241],\n", " [ 0.97236054, -0.97592586, 0.98942764],\n", " [-0.82942252, 0.97808555, 0.99074451],\n", " [-0.99540778, 0.04510161, -0.78967455]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hidden_values" ] }, { "cell_type": "code", "execution_count": 17, "id": "3bcd0675", "metadata": {}, "outputs": [], "source": [ "def NormalizeData(data):\n", " return (data - np.min(data)) / (np.max(data) - np.min(data))" ] }, { "cell_type": "code", "execution_count": 18, "id": "e4ef91c8", "metadata": {}, "outputs": [], "source": [ "normalised = NormalizeData(hidden_values)\n", "normalised_hidden_values = preprocessing.minmax_scale(hidden_values, feature_range=(0, 1), axis=0, copy=True)\n", "hidden_values_2 = [np.round(x,3) for x in normalised_hidden_values]\n", "hidden_values_rounded = [np.round(x) for x in normalised_hidden_values]" ] }, { "cell_type": "markdown", "id": "dac577d2", "metadata": {}, "source": [ "Below, the hidden values are seen first rounded to 3 decimal places and then rounded to 0 or 1. " ] }, { "cell_type": "code", "execution_count": 24, "id": "9230757c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The hidden values when normalised and rounded to 3 decimal places are\n" ] }, { "data": { "text/plain": [ "[array([1. , 0.478, 0.006]),\n", " array([0.048, 0.028, 0.999]),\n", " array([0.515, 1. , 0. ]),\n", " array([0.996, 0.991, 0.974]),\n", " array([0.465, 0. , 0.013]),\n", " array([0.989, 0.009, 0.999]),\n", " array([0.083, 0.992, 1. ]),\n", " array([0. , 0.523, 0.078])]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('The hidden values when normalised and rounded to 3 decimal places are')\n", "hidden_values_2" ] }, { "cell_type": "code", "execution_count": 26, "id": "a3449082", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The hidden values when normalised and rounded to 0 decimal places are\n" ] }, { "data": { "text/plain": [ "[array([1., 0., 0.]),\n", " array([0., 0., 1.]),\n", " array([1., 1., 0.]),\n", " array([1., 1., 1.]),\n", " array([0., 0., 0.]),\n", " array([1., 0., 1.]),\n", " array([0., 1., 1.]),\n", " array([0., 1., 0.])]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('The hidden values when normalised and rounded to 0 decimal places are')\n", "hidden_values_rounded" ] }, { "cell_type": "markdown", "id": "7de42758", "metadata": {}, "source": [ "It can be seen that each of the 8 vectors in the input are uniquely represented by a hidden layer value which consists of 3 binary values. Thus this shows how the use of backpropagation can capture properties of the input units.\n", "\n", "(1,0,0,0,0,0,0,0) is represented by (1,0,0) \\\n", "(0,1,0,0,0,0,0,0) is represented by (0,0,1) \\\n", "(0,0,1,0,0,0,0,0) is represented by (1,1,0) \\\n", "(0,0,0,1,0,0,0,0) is represented by (1,1,1) \\\n", "(0,0,0,0,1,0,0,0) is represented by (0,0,0) \\\n", "(0,0,0,0,0,1,0,0) is represented by (1,0,1) \\\n", "(0,0,0,0,0,0,1,0) is represented by (0,1,1) \\\n", "(0,0,0,0,0,0,0,1) is represented by (0,1,0) \n" ] }, { "cell_type": "code", "execution_count": null, "id": "31604c0c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }