{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "**Gradient Boosting regression**" ], "metadata": { "id": "nxmvYLqvDyLo" } }, { "cell_type": "code", "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from sklearn import datasets, ensemble\n", "from sklearn.inspection import permutation_importance\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.model_selection import train_test_split" ], "metadata": { "id": "2-XgHpkGDzXw" }, "execution_count": 1, "outputs": [] }, { "cell_type": "markdown", "source": [ "Loading the data ...." ], "metadata": { "id": "wLXLkDTZEA3D" } }, { "cell_type": "code", "source": [ "diabetes = datasets.load_diabetes()\n", "X, y = diabetes.data, diabetes.target" ], "metadata": { "id": "E8-kgh31EEJ8" }, "execution_count": 2, "outputs": [] }, { "cell_type": "markdown", "source": [ "splitting the data set in training and testing 90:10, and setting up the parameters of regression model" ], "metadata": { "id": "qJTLKcx3EL_l" } }, { "cell_type": "code", "source": [ "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.1, random_state=13\n", ")\n", "\n", "params = {\n", " \"n_estimators\": 500,\n", " \"max_depth\": 4,\n", " \"min_samples_split\": 5,\n", " \"learning_rate\": 0.01,\n", " \"loss\": \"squared_error\",\n", "}" ], "metadata": { "id": "Fr_19Lc8ENsc" }, "execution_count": 3, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now we will initiate the gradient boosting regressors and fit it with our training data. Let’s also look and the mean squared error on the test data." ], "metadata": { "id": "AMAZXfIHEYQ6" } }, { "cell_type": "code", "source": [ "reg = ensemble.GradientBoostingRegressor(**params)\n", "reg.fit(X_train, y_train)\n", "\n", "mse = mean_squared_error(y_test, reg.predict(X_test))\n", "print(\"MSE on test set: {:.4f}\".format(mse))\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "F1qcrQaeEdC5", "outputId": "c09d6794-0d10-4056-ab51-f51f01e31f92" }, "execution_count": 8, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "MSE on test set: 3025.2378\n" ] } ] }, { "cell_type": "markdown", "source": [ "**Plot training deviance**" ], "metadata": { "id": "NfaAkYReEhcn" } }, { "cell_type": "markdown", "source": [ "at the end we have the results. for that that we will first calculate the test set deviance and then plot it against boosting iterations.\n" ], "metadata": { "id": "2sZhUqydElZw" } }, { "cell_type": "code", "source": [ "test_score = np.zeros((params[\"n_estimators\"],), dtype=np.float64)\n", "for i, y_pred in enumerate(reg.staged_predict(X_test)):\n", " test_score[i] = mean_squared_error(y_test, y_pred)\n", "\n", "fig = plt.figure(figsize=(6, 6))\n", "plt.subplot(1, 1, 1)\n", "plt.title(\"Deviance Model\")\n", "plt.plot(\n", " np.arange(params[\"n_estimators\"]) + 1,\n", " reg.train_score_,\n", " \"b-\",\n", " label=\"Training Set Deviance\",\n", ")\n", "plt.plot(\n", " np.arange(params[\"n_estimators\"]) + 1, test_score, \"r-\", label=\"Test Set Deviance\"\n", ")\n", "plt.legend(loc=\"upper right\")\n", "plt.xlabel(\"Boosting Iterations\")\n", "plt.ylabel(\"Deviance\")\n", "fig.tight_layout()\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 441 }, "id": "Sm7zU63nEmO1", "outputId": "0011ea37-394f-48ad-e1fe-de1216d710dd" }, "execution_count": 9, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] } ] }