{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "In this little experiment we'll see 3 cases of linear regression model and we'll see the value of bias and variance that we will obtain. The 3 cases are:\n", "\n", "\n", "\n", "* data with no noise\n", "* data with a little bit of noise\n", "* data with no relation between X and Y" ], "metadata": { "id": "NjjaAH_-aXIo" } }, { "cell_type": "markdown", "source": [ "# First case: data with no noise" ], "metadata": { "id": "o8gYORtVbTXO" } }, { "cell_type": "code", "execution_count": 58, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 304 }, "id": "5PK0fCwyQwsd", "outputId": "1fb56bef-21ec-4f4e-b205-e5df37ee458a" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } }, { "output_type": "stream", "name": "stdout", "text": [ "Bias: 1.0845392999330622e-32\n", "Variance: 0.04953967957663739\n" ] } ], "source": [ "# importation of the necessary libraries\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# data generation (X and y match perfectly)\n", "X = np.random.rand(100, 1)\n", "y = X\n", "\n", "# division of the data into training set and test set\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", "\n", "# training the linear regression model\n", "reg = LinearRegression().fit(X_train, y_train)\n", "\n", "# prevision on test set\n", "y_pred = reg.predict(X_test)\n", "\n", "# calculation of mse\n", "mse = mean_squared_error(y_test, y_pred)\n", "\n", "# calculation of bias and variance\n", "bias = np.mean((y_test - y_pred)**2)\n", "variance = np.var(y_pred)\n", "\n", "# graph of the linear regression\n", "plt.scatter(X_test, y_test, label=\"True value\")\n", "plt.plot(X_test, y_pred, label=\"Prediction\")\n", "plt.legend()\n", "plt.show()\n", "\n", "# result\n", "print(\"Bias:\", bias)\n", "print(\"Variance:\", variance)\n" ] }, { "cell_type": "markdown", "source": [ "Wee can see that bias and variance are really small (bias is close to zero or is zero). This make sense since X and y have perfect linear relationship" ], "metadata": { "id": "VogMeevOc2I3" } }, { "cell_type": "markdown", "source": [ "# Second case: data with a little bit of noise" ], "metadata": { "id": "W9kkXchUfTrt" } }, { "cell_type": "code", "source": [ "# importation of the necessary libraries\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# data generation (y has some noise)\n", "X = np.random.rand(100, 1)\n", "y = X + (np.random.rand(100, 1)/2)\n", "\n", "# division of the data into training set and test set\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", "\n", "# training the linear regression model\n", "reg = LinearRegression().fit(X_train, y_train)\n", "\n", "# prevision on test set\n", "y_pred = reg.predict(X_test)\n", "\n", "# calculation of mse\n", "mse = mean_squared_error(y_test, y_pred)\n", "\n", "# calculation of bias and variance\n", "bias = np.mean((y_test - y_pred)**2)\n", "variance = np.var(y_pred)\n", "\n", "# graph of the linear regression\n", "plt.scatter(X_test, y_test, label=\"True value\")\n", "plt.plot(X_test, y_pred, label=\"Prediction\")\n", "plt.legend()\n", "plt.show()\n", "\n", "# result\n", "print(\"Bias:\", bias)\n", "print(\"Variance:\", variance)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 300 }, "id": "ulA3oNBmdZcF", "outputId": "878b5719-1580-4c1a-e514-cbb47c360c44" }, "execution_count": 57, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } }, { "output_type": "stream", "name": "stdout", "text": [ "Bias: 0.022212543739010212\n", "Variance: 0.08728939726794703\n" ] } ] }, { "cell_type": "markdown", "source": [ "In this case we have an heigher bias because the prediction doesn't match perfectly the true values" ], "metadata": { "id": "VDZ69tAJgv4D" } }, { "cell_type": "markdown", "source": [ "# Third case: data with no relation between X and Y" ], "metadata": { "id": "G8Ue_MJ1gopb" } }, { "cell_type": "code", "source": [ "# importation of the necessary libraries\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# data generation (X and y has no relation between them)\n", "X = np.random.rand(100, 1)\n", "y = (np.random.rand(100, 1))\n", "\n", "# division of the data into training set and test set\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", "\n", "# training the linear regression model\n", "reg = LinearRegression().fit(X_train, y_train)\n", "\n", "# prevision on test set\n", "y_pred = reg.predict(X_test)\n", "\n", "# calculation of mse\n", "mse = mean_squared_error(y_test, y_pred)\n", "\n", "# calculation of bias and variance\n", "bias = np.mean((y_test - y_pred)**2)\n", "variance = np.var(y_pred)\n", "\n", "# graph of the linear regression\n", "plt.scatter(X_test, y_test, label=\"True value\")\n", "plt.plot(X_test, y_pred, label=\"Prediction\")\n", "plt.legend()\n", "plt.show()\n", "\n", "# result\n", "print(\"Bias:\", bias)\n", "print(\"Variance:\", variance)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 300 }, "id": "Tn_GKvZmduQV", "outputId": "874bafc9-9592-4265-8ad0-b141330735ff" }, "execution_count": 56, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } }, { "output_type": "stream", "name": "stdout", "text": [ "Bias: 0.07132226716060397\n", "Variance: 0.0018924772855634625\n" ] } ] }, { "cell_type": "markdown", "source": [ "In this case we have the heigest bias, but we have a small variance, since the model predictions will all be very similar to each other (since they will all be very far from true values)" ], "metadata": { "id": "GpxLQkg2hExr" } } ] }