{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classification on Wine Dataset\n", "\n", "## IMPORTANT: make sure to rerun all the code from the beginning to obtain the results for the final version of your notebook, since this is the way we will do it before evaluting your notebook!!!\n", "\n", "### Dataset description\n", "\n", "We will be working with a dataset on wines from the UCI machine learning repository\n", "(http://archive.ics.uci.edu/ml/datasets/Wine). It contains data for 178 instances. \n", "The dataset is the results of a chemical analysis of wines grown in the same region\n", "in Italy but derived from three different cultivars. The analysis determined the\n", "quantities of 13 constituents found in each of the three types of wines. \n", "\n", "### The features in the dataset are:\n", "\n", "- Alcohol\n", "- Malic acid\n", "- Ash\n", "- Alcalinity of ash\n", "- Magnesium\n", "- Total phenols\n", "- Flavanoids\n", "- Nonflavanoid phenols\n", "- Proanthocyanins\n", "- Color intensity\n", "- Hue\n", "- OD280/OD315 of diluted wines\n", "-Proline\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first import all the packages that are needed" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "import numpy as np\n", "import scipy as sp\n", "from scipy import stats\n", "from sklearn import datasets\n", "from sklearn import linear_model\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Perceptron\n", "We will implement the perceptron and use it to learn a halfspace with 0-1 loss." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Set the random seed to your ID (matricola)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "IDnumber = 4 #COMPLETE\n", "np.random.seed(IDnumber)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the dataset from scikit learn and then split in training set and test set (50%-50%) after applying a random permutation to the datset." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Load the dataset from scikit learn\n", "wine = datasets.load_wine()\n", "\n", "m = wine.data.shape[0]\n", "permutation = np.random.permutation(m)\n", "\n", "X = wine.data[permutation]\n", "Y = wine.target[permutation]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]\n", " [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]\n", " [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]\n", " ...\n", " [1.327e+01 4.280e+00 2.260e+00 ... 5.900e-01 1.560e+00 8.350e+02]\n", " [1.317e+01 2.590e+00 2.370e+00 ... 6.000e-01 1.620e+00 8.400e+02]\n", " [1.413e+01 4.100e+00 2.740e+00 ... 6.100e-01 1.600e+00 5.600e+02]]\n" ] }, { "data": { "text/plain": [ "178" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(wine.data)\n", "wine.data.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to classify class \"1\" vs the other two classes (0 and 2). We are going to relabel the other classes (0 and 2) as \"-1\" so that we can use it directly with the perceptron." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "#let's relabel classes 0 and 2 as -1\n", "\n", "for i in range(len(Y)):\n", " if Y[i] != 1:\n", " Y[i] = -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Divide the data into training set and test set (50% of the data each). **Note**: we do not normalize the features since it is not needed for this dataset and task." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1 1 -1 -1 1 -1 1 1 -1 -1 -1 -1 1 -1 1 1 1 -1 1 -1 1 1 -1 -1\n", " 1 -1 -1 -1 -1 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 -1 -1 -1 1 -1 1\n", " -1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 1\n", " 1 -1 1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 -1 -1 1]\n" ] } ], "source": [ "#Divide in training and test: make sure that your training set\n", "#contains at least 10 elements from class 1 and at least 10 elements\n", "#from class -1! If it does not, modify the code so to apply more random\n", "#permutations (or the same permutation multiple times) until this happens.\n", "\n", "#m_training needs to be the number of samples in the training set\n", "# m_training = #COMPLETE\n", "\n", "#m_test needs to be the number of samples in the test set\n", "# m_test = #COMPLETE\n", "\n", "#X_training = instances for training set\n", "# X_training = #COMPLETE\n", "#Y_training = labels for the training set\n", "# Y_training = #COMPLETE\n", "\n", "#X_test = instances for test set\n", "# X_test = #COMPLETE\n", "#Y_test = labels for the test set\n", "# Y_test = #COMPLETE\n", "\n", "X_training, X_test, Y_training, Y_test = train_test_split(X, Y, test_size=0.50, random_state=12, stratify=Y)\n", " # stratify makes sure y id divided proportionally\n", "m_training = X_training.shape[0]\n", "m_test = X_test.shape[0]\n", "\n", "print(Y_training) #to make sure that Y_training contains both 1 and -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Now add a 1 in front of each sample so that we can use a vector to describe all the coefficients of the model. You can use the function $hstack$ in $numpy$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "#add a 1 to each sample\n", "train_ones = np.ones((m_training,1))\n", "test_ones = np.ones((m_test,1))\n", "\n", "X_training = np.hstack((train_ones, X_training)) #COMPLETE\n", "X_test = np.hstack((test_ones, X_test)) #COMPLETE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Now complete the function *perceptron*. Since the perceptron does not terminate if the data is not linearly separable, your implementation should return the desired output (see below) if it reached the termination condition seen in class or if a maximum number of iterations have already been run, where 1 iteration corresponds to 1 update of the perceptron weights. If the perceptron returns because the maximum number of iterations has been reached, you should return an appropriate model. \n", "\n", "The input parameters to pass are:\n", "- $X$: the matrix of input features, one row for each sample\n", "- $Y$: the vector of labels for the input features matrix X\n", "- $max\\_num\\_iterations$: the maximum number of iterations for running the perceptron\n", "\n", "The output values are:\n", "- $best\\_w$: the vector with the coefficients of the best model\n", "- $best\\_error$: the *fraction* of missclassified samples for the best model" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "from sklearn.linear_model import Perceptron\n", "from sklearn.metrics import accuracy_score\n", "\n", "def perceptron(X, Y, max_num_iterations):\n", " \n", " p = Perceptron(max_iter= max_num_iterations, random_state= 5)\n", " p.fit(X,Y)\n", " best_w = p.coef_\n", " predict_train = p.predict(X)\n", " training_score = accuracy_score(predict_train, Y)\n", " best_error = 1 - training_score\n", " \n", " return best_w, best_error, p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we use the implementation above of the perceptron to learn a model from the training data using 100 iterations and print the error of the best model we have found." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "jupyter": { "outputs_hidden": true }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training error with 100 iterations: 0.2471910112359551\n" ] } ], "source": [ "#now run the perceptron for 100 iterations\n", "w_found, training_error, p = perceptron(X_training,Y_training, 100)\n", "print(\"Training error with 100 iterations: \"+str(training_error))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** use the best model $w\\_found$ to predict the labels for the test dataset and print the fraction of missclassified samples in the test set (that is an estimate of the true loss)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Estimated true loss with 100 iterations:0.29213483146067415\n" ] } ], "source": [ "#now use the w_found to make predictions on test dataset\n", "\n", "#num_errors = number of errors in the test set\n", "num_errors = 0.\n", "\n", "#ADD CODE!\n", "prediction = p.predict(X_test)\n", "for i in range(len(prediction)):\n", " num_errors = np.sum(prediction != Y_test)\n", "\n", "true_loss_estimate = num_errors/m_test\n", "\n", "#NOTE: you can avoid using num_errors if you prefer, as long as true_loss_estimate is correct\n", "print(\"Estimated true loss with 100 iterations:\"+str(true_loss_estimate))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO**: [Answer the following] what relation do you observe between the training error and the (estimated) true loss? Is this what you expected? Explain what you observe and why it does or does not conform to your expectations. [Write the answer in this cell]\n", "\n", "**ANSWER**: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Copy the code from the last 2 cells above in the cell below and repeat the training with 10000 iterations. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training error with 10000 iterations: 0.2471910112359551\n", "Estimated true loss with 10000 iterations:0.29213483146067415\n" ] } ], "source": [ "#now run the perceptron for 10000 iterations here!\n", "\n", "#ADD CODE!\n", "w_found, training_error, p = perceptron(X_training,Y_training, 10000)\n", "#training_error = error on the training set\n", "print(\"Training error with 10000 iterations: \"+str(training_error))\n", "\n", "#num_errors = number of errors in the test set\n", "num_errors = 0.\n", "\n", "#ADD CODE!\n", "prediction = p.predict(X_test)\n", "for i in range(len(prediction)):\n", " num_errors = np.sum(prediction != Y_test)\n", "\n", "\n", "true_loss_estimate = num_errors/m_test\n", "\n", "#NOTE: you can avoid using num_errors if you prefer, as long as true_loss_estimate is correct\n", "print(\"Estimated true loss with 10000 iterations:\"+str(true_loss_estimate))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** [Answer the following] What changes in the training error and in the test error (in terms of fraction of missclassified samples)? Explain what you observe. [Write the answer in this cell]\n", "\n", "**ANSWER**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression\n", "Now we use logistic regression, as implemented in Scikit-learn, to predict labels. We first do it for 2 labels and then for 3 labels. We will also plot the decision region of logistic regression.\n", "\n", "We first load the dataset again." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Load the dataset from scikit learn\n", "wine = datasets.load_wine()\n", "\n", "m = wine.data.shape[0]\n", "permutation = np.random.permutation(m)\n", "\n", "X = wine.data[permutation]\n", "Y = wine.target[permutation]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** As for the previous part, divide the data into training and test (50%-50%), relabel classes 0 and 2 as -1. Here there is no need to add a 1 at the beginning of each row, since it will be done automatically by the function we will use." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "#Divide in training and test: make sure that your training set\n", "#contains at least 10 elements from class 1 and at least 10 elements\n", "#from class -1! If it does not, modify the code so to apply more random\n", "#permutations (or the same permutation multiple times) until this happens.\n", "#IMPORTANT: do not change the random seed.\n", "\n", "# m_training = #COMPLETE\n", "# m_test = #COMPLETE\n", "\n", "# X_training = #COMPLETE\n", "# Y_training = #COMPLETE\n", "\n", "# X_test = #COMPLETE\n", "# Y_test = #COMPLETE\n", "\n", "\n", "\n", "#let's relabel classes 0 and 2 as -1\n", "\n", "for i in range(len(Y)):\n", " if Y[i] != 1:\n", " Y[i] = -1\n", " \n", "X_training, X_test, Y_training, Y_test = train_test_split(X, Y, test_size=0.50, random_state=12, stratify=Y)\n", "\n", "m_training = X_training.shape[0]\n", "m_test = X_test.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To define a logistic regression model in Scikit-learn use the instruction\n", "\n", "$linear\\_model.LogisticRegression(C=1e5)$\n", "\n", "($C$ is a parameter related to *regularization*, a technique that\n", "we will see later in the course. Setting it to a high value is almost\n", "as ignoring regularization, so the instruction above corresponds to the\n", "logistic regression you have seen in class.)\n", "\n", "To learn the model you need to use the $fit(...)$ instruction and to predict you need to use the $predict(...)$ function. See the Scikit-learn documentation for how to use it.\n", "\n", "**TO DO** Define the logistic regression model, then learn the model using the training set and predict on the test set. Then print the fraction of samples missclassified in the training set and in the test set." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Error rate on training set: 0.0337078651685393\n", "Error rate on test set: 0.0561797752808989\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\gagan\\anaconda3\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1):\n", "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", "\n", "Increase the number of iterations (max_iter) or scale the data as shown in:\n", " https://scikit-learn.org/stable/modules/preprocessing.html\n", "Please also refer to the documentation for alternative solver options:\n", " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", " n_iter_i = _check_optimize_result(\n" ] } ], "source": [ "#part on logistic regression for 2 classes\n", "logreg = linear_model.LogisticRegression(max_iter=50) #COMPLETE\n", "\n", "#learn from training set\n", "\n", "#ADD CODE!\n", "logreg.fit(X_training,Y_training)\n", "\n", "\n", "#predict on training set\n", "\n", "#ADD CODE!\n", "prediction_train = logreg.predict(X_training)\n", "error_rate_training = 1 - accuracy_score(prediction_train, Y_training)\n", "#print the error rate = fraction of missclassified samples\n", "print(\"Error rate on training set: \"+str(error_rate_training))\n", "\n", "#predict on test set\n", "\n", "#ADD CODE!\n", "prediction_test = logreg.predict(X_test)\n", "error_rate_test = 1 - accuracy_score(prediction_test, Y_test)\n", "\n", "#print the error rate = fraction of missclassified samples\n", "\n", "print(\"Error rate on test set: \"+str(error_rate_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we do logistic regression for classification with 3 classes.\n", "\n", "**TO DO** First: let's load the data once again (with the same permutation from before)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 1, 0, 1, 2, 0, 1, 1, 0, 2, 2, 1, 1, 0, 1, 2, 2, 1, 0, 0, 0,\n", " 0, 0, 0, 1, 0, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 1, 0, 1, 0, 2, 1,\n", " 0, 1, 0, 0, 1, 2, 0, 1, 2, 2, 2, 1, 0, 2, 0, 2, 2, 1, 1, 0, 1, 0,\n", " 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 2, 2, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0,\n", " 1])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#part on logistic regression for 3 classes\n", "\n", "#Divide in training and test: make sure that your training set\n", "#contains at least 10 elements from each of the 3 classes!\n", "#If it does not, modify the code so to apply more random\n", "#permutations (or the same permutation multiple times) until this happens.\n", "#IMPORTANT: do not change the random seed.\n", "X = wine.data[permutation]\n", "Y = wine.target[permutation]\n", "\n", "# X_training = #COMPLETE\n", "# Y_training = #COMPLETE\n", "\n", "# X_test = #COMPLETE\n", "# Y_test = #COMPLETE \n", "X_training, X_test, Y_training, Y_test = train_test_split(X, Y, test_size=0.50, random_state=12, stratify=Y)\n", "Y_training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Now perform logistic regression (instructions as before) for 3 classes, learning a model from the training set and predicting on the test set. Print the fraction of missclassified samples on the training set and the fraction of missclassified samples on the test set." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Error rate on training set: 0.0561797752808989\n", "Error rate on test set: 0.101123595505618\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\taran\\anaconda3\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n", "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", "\n", "Increase the number of iterations (max_iter) or scale the data as shown in:\n", " https://scikit-learn.org/stable/modules/preprocessing.html\n", "Please also refer to the documentation for alternative solver options:\n", " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", " n_iter_i = _check_optimize_result(\n" ] } ], "source": [ "#part on logistic regression for 3 classes\n", "logreg = linear_model.LogisticRegression(max_iter=50) #COMPLETE\n", "\n", "#learn from training set\n", "\n", "#ADD CODE!\n", "logreg.fit(X_training,Y_training)\n", "\n", "#predict on training set\n", "\n", "#ADD CODE!\n", "prediction_train = logreg.predict(X_training)\n", "error_rate_training = 1 - accuracy_score(prediction_train, Y_training)\n", "#print the error rate = fraction of missclassified samples\n", "print(\"Error rate on training set: \"+str(error_rate_training))\n", "\n", "#predict on test set\n", "\n", "#ADD CODE!\n", "prediction_test = logreg.predict(X_test)\n", "error_rate_test = 1 - accuracy_score(prediction_test, Y_test)\n", "\n", "#print the error rate = fraction of missclassified samples\n", "print(\"Error rate on test set: \"+str(error_rate_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TO DO** Now pick two features and restrict the dataset to include only two features, whose indices are specified in the $feature$ vector below. Then split into training and test." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "#to make the plot we need to reduce the data to 2D, so we choose two features\n", "\n", "features_list = ['Alcohol',\n", "'Malic acid',\n", "'Ash',\n", "'Alcalinity of ash',\n", "'Magnesium',\n", "'Total phenols',\n", "'Flavanoids',\n", "'Nonflavanoid phenols',\n", "'Proanthocyanins',\n", "'Color intensity',\n", "'Hue',\n", "'OD280/OD315 of diluted wines',\n", "'Proline']\n", "labels_list = ['class_0', 'class_1', 'class_2']\n", "\n", "index_feature1 = 0 # Alcohol #COMPLETE\n", "index_feature2 = 5 # Total phenols #COMPLETE\n", "features = [index_feature1, index_feature2]\n", "\n", "feature_name0 = features_list[features[0]]\n", "feature_name1 = features_list[features[1]]\n", "\n", "#X_red is X reduced to include only the 2 features of\n", "#indices index_feature1 and index_feature2\n", "X_red = X[:,features]\n", "\n", "# X_red_training = #COMPLETE\n", "# Y_training = #COMPLETE\n", "\n", "# X_red_test = #COMPLETE\n", "# Y_test = #COMPLETE\n", "\n", "X_red_training, X_red_test, Y_training, Y_test = train_test_split(X_red, Y, test_size=0.50, random_state=12, stratify=Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now learn a model using the training data." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "data": { "text/plain": [ "0.8314606741573034" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#ADD CODE! (only for learning from training data)\n", "logreg = linear_model.LogisticRegression(max_iter=50) #COMPLETE\n", "logreg.fit(X_red_training,Y_training)\n", "logreg.score(X_red_training,Y_training)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If everything is ok, the code below uses the model in $logreg$ to plot the decision region for the two features chosen above, with colors denoting the predicted value. It also plots the points (with correct labels) in the training set. It makes a similar plot for the test set." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":14: MatplotlibDeprecationWarning: shading='flat' when X and Y have the same dimensions as C is deprecated since 3.3. Either specify the corners of the quadrilaterals with X and Y, or pass shading='auto', 'nearest' or 'gouraud', or set rcParams['pcolor.shading']. This will become an error two minor releases later.\n", " plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ ":32: MatplotlibDeprecationWarning: shading='flat' when X and Y have the same dimensions as C is deprecated since 3.3. Either specify the corners of the quadrilaterals with X and Y, or pass shading='auto', 'nearest' or 'gouraud', or set rcParams['pcolor.shading']. This will become an error two minor releases later.\n", " plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAADPCAYAAAAzmacdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAASLUlEQVR4nO3dfYwc9X3H8ffXe5wN9+ADTKERFg4PIaE8CCtV2kKJnTSoVAkOhgYwTQItBIggQRWtUCO1IUqkJoVWl5JghzQpYLCTBpJQWlGaYidFTUR5KA8NFlUKlKQ0qfHT+eqc7fO3f+ysPbe3Ozu7O7Pz9HlJ1t3O7sz+1rrP/B5n1twdESm/BVkXQEQGQ2EXqQiFXaQiFHaRilDYRSpCYRepCIVdpCIU9oIys92hfwfMbE/o8RU9HG+zmV2dQjmvNLPHkz6udG8o6wJIb9x9tPG7mb0CXO3u38muRJJ3qtlLxswWmNktZvYjM3vDzL5uZkcFzy0ys/XB9h1m9q9mdqyZfQb4deCOoGVwR4vjttw3eG6xmf2Vmb1uZj8xs0+bWc3M3gasBX41OO6OAf5XSBOFvXw+BrwfeCfwJmA78IXguQ8Di4GlwNHAdcAed/8E8M/ADe4+6u43tDhuy32D5+4G9gMnA2cD51NvabwYvO77wXEnEv2k0hWFvXyuBT7h7j929xngk8AlZjYE7KMe1JPdfdbdn3L3XTGP23LfoHa/ALjJ3afd/WfAXwCXJf3BpD/qs5fPCcA3zexAaNsscCxwL/WaeaOZTQDrqZ8Y9sU4bst9g/c7DHjdzBqvXQC81vcnkUSpZi+f14AL3H0i9G+Ru//E3fe5+63ufhrwa8B7gQ8F+0Ve/hix72vADLAk9H7j7v5LcY4rg6Owl89a4DNmdgKAmR1jZquC31ea2RlmVgN2UW+azwb7/RQ4sd1B2+3r7q8DjwK3m9l4MEB4kpm9M3Tc481sOIXPKl1Q2MtnEngIeNTMpoAfAO8InjsO+Ab1sL4IfJd6c7yx3yVmtt3MPt/iuFH7fggYBn5IfUDwG8AvBs89Bvw78D9mtjWhzyg9MN28QqQaVLOLVITCLlIRCrtIRSjsIhWhsItURCor6JYsPtyXHTeexqGlybbasVkXQXLk5Ref3+rux7R6LpWwLztunCfWXZ7GoaWFry2+OesiSE6sWb701XbPqRkvUhEKu0hFKOwlcOnO27IughSAwi5SEQp7Sah2l04UdpGKUNhFKkJhLxE15SWKwi5SEQq7SEUo7CIVobCLVITCXjIapJN2FHaRilDYS0i1u7SisItUhL7rTaTJNStOZ3rXznnbR8YXc9fmFzIoUTIU9pK6dOdtuoNNj6Z37eT+p+d/L+Wa5UszKE1y1IwXqQiFvcQ0UCdhCrtIRSjsIhWhATqRJiPji1sOxo2ML86gNMlR2EtOo/LdK/L0WhQ140UqQmGvAI3KCyjsIpWhsItUhMJeEWrKi8IuUhEKe4Wodq82hV2kIhR2kYpQ2CtGTfnqUthFKkJr4yuoKOvly3p7qKwo7ClYcuFatk/NzNt+5NhCtj50XQYlKqay3h4qKwp7CrZPzTC76ePzttdWTmZQGpE69dkrSgN11aOaXaRJWccKFPYKK8pA3aCVdaxAYZfciro9VFlr3zQp7Ck4cmxhy8G4I8cWZlCa4ooK7ZrlS0tZ+6ZJYU9BkabX1JSvDo3Gi1SEanZR7d5Et5IWqYiyDvAp7FJIZa1909Qx7Gb2OeDTwB7gEeAs4CZ3X59y2UTaynvtm8epwTg1+/nu/odmdhHwY+C3gU2Awi7SRh4X5sQZjT8s+PlbwAZ335ZieSQjWitffnFq9r81sy3Um/EfNbNjgJ+nWywRSVrHsLv7LWb2WWCXu8+a2TSwKv2iSdby0u/spRx5KXuetA27ma1usS388ME0CiTZaZ5vz0u/M6oc4bKEg5yXsudJVM3+vojnHIVdcmRBrcbM9K6si3FQHqcG24bd3a8aZEEkH4q2mq5Re69ZvpQDGZclLI9dhY6j8Wa22Mz+3MyeDP7dbmZauSBSMHGm3r4CTAEfCP7tAr6aZqFEunHNitMP/h7ux4e3S7ypt5Pc/eLQ41vN7N9SKk8hJXk32TzcmbbRlO+33xlnRDzOazqVI+4AXmOfqo7Uxwn7HjM7190fBzCzc6jPuUsgybvJJnmsfk8c/f7hxxkRj/OacDna3bSinXbHruJIfZywXwfcE/TTDdgGXJlmoSQZ/Zw44g7URdWSaWhXy/er0eRv1RIoS20fZ1HNs8BZZjYePM7P/IZkbtDz2a2Cl8R7VWFePs5VbwuBi4FlwFBjYY27fyrVkokkoF3Lo4riNOO/DewEngLmdwBFMhY1gNeqxi5Tbd2NOGE/3t1/M/WSFFiSd5PN051p+11gE2c0v91rFtRqB7d36jd3ugtt3HKVXZyw/4uZneHuz6demoJKckosyWNlfeKIM7AVZ6Q96WDetfmFee9VhfDHCfu5wJVm9jL1ZrwB7u5nploy6VsSJ45OtXse14D3oiyfI0qcsF+QeilEEtJqQG7N8qV9dQXKIs7U26tmdi5wirt/Nbh5xWj6RZOsHVqUM7crMDY+xrrNPwTyN2XVzeWwVRNn6u1PgLcDp1JfE38Y9fvPnZNu0colD8tgu9Xvar7mGvWa897G9O7d8143MjrKXd97seOx5uzTw2KXblbelVGcZvxFwNnA0wDu/t9mNpZqqUooyWWwRXH/06/NXRq7ezffvvyt8163asMWILrf3BzsKgyoJS1O2Pe6u5uZA5jZSMplkopqF+gq9KcHIc4lrl83s3XAhJldA3wHuCvdYkmRXbviNOBQWBv95aGatd1Hl6OmL84A3W1m9h7q17GfCvyxu/9j6iXLuaz64Hnr+7drenfbZUlqSWtSU2hlvAw21tc/BeEufMCTDEpWffCk3vfoC7/Ejqn5VypPjB3OGw99BGi/KGds/NCQTeMPP7xIJcv+dFJBzNssQxLijMavBj4L/AL1BTWNRTXjKZftoKRCmuUg2VDNWr5Pc9N2UDX3jqk9Hf+YvU0v70CL7XGXoDYG4+bsOzraepR+fDF7pneXfrHLoMSp2T8HvM/do+dGUlSGkeyxI4ZbhnjsiOE5j6M+66A/b5wTQkPzstcovRxT+hcn7D/NMuhlkUStPLvp47k+wYX7ueFyhrsGkp04XxLxpJl9DfgWoUtc3V33jZc5ovq5tZWTjI2PcYAFapZnJO6XRPwfcH7oceW/JCKrK8qyvpKtV+GTQKuR7uldO7lmxem5abqX8cKYSn1JRJJByWqJa+N9aysnW/bt46oNDbX8Y64NxZqg6UsRRrrzctJJUpzR+BOpXwnxK9Rr9O8DN7n7yymX7aCkQprXNehhg6q5Z/fvj1y6CvW+dqsATowdXrhvjpF4A3T3A1+gvkYe4DJgI/COtArVrAghTUr4s9ZWTjK2sMbUzCzbp2bmnAQG0WzXoFq5xAm7ufu9ocfrzeyGtAokc61ffcq8bas2bMnlCbCM/dwyiRP2TWZ2C/Xa3IFLgb8zs6MA3H1biuWTAiljP7dM4oT90uDntU3bf5d6+E9MtEQyx+88+B9MzczO277kwrW5rN3jUAsgG3EuhHnzIApSRINY2jo1M9vVirrGe3cq28TIcMulqxMjw/O2ARz93jvZMb23aetkrBtPNFMLIBvpz7OUWNrLeCdGhlsE7JCo9+5Utjcevr6rsuyY3ttx9F7yTWHvQbjWbB4hT7Jp/cbD1+d6eWzDyJ+9J9brpv+g8BdOFprC3oMyXJiTBZ0UshW1Nn551I7u/nTyxSmfvN1sogh0UkhHVM1+e8RzDrwr4bKUUr+tgHYr6qJu8VQVOil0J2pt/MpBFqQsaisnE1nd1qlFsOTCtZHLapNedttu9H50OM5tDLMV96TQraKdRGL12c3sdOA0YFFjm7vfk1ahiqyfi1PCOrUIOnUBku4i3H2hllM0K1rLIu6XRKygHva/p/51UI8DlQ171peZ1lZOUjOY9fnPTYwMdz2t1snDX/qHRI9XNXk5KcSp2S8BzgKecferzOxY4Muplirn2i1a2T41M5CVbY071gxi3ltBH5y0Twpxwr7H3Q+Y2X4zGwd+RgWWyHbqM8cdeMu6FdAPBT2feh2DiBP2J81sgvoXQzwF7Aae6OndCiSpufSiTq8p6OUTZ238R4Nf15rZI8C4uz+XbrHksNqCeSeWxmxb2ot3FPRyijNA90/u/m4Ad3+leZukY9/sgTkti9rKSWadg/30tNakK+jlFbWCbhFwBLDEzI6k/uUQAOPAmwZQtkpoNzbQvGjmyLGFc143Orygq6vW4lDQyy2qZr8WuIl6sMNLY3dRv01VpSU18BZ3bGDrQ9fN2XbfxW85+PuqDVv6nt/PU9CveOAldu89MG/76PCCOZ9buhO1gm4SmDSzG939LwdYplzo9HVNRR14ayVPQQfYvfdA22lFnQh6F2c0fp2ZfQw4L3i8GVjn7vtSK1UOxP26pqLLW9A7iToRSLQ4Yf8icFjwE+CDwJ3A1WkVKg/yUHO3Go1Psp9etKBLf6IG6IbcfT/wy+5+Vuipx8zs2fSLJkmts2+ln6CrKV1MUTX7E8ByYNbMTnL3H8HBL42YfwdE6UkWK+z6rdHVlC6mqLA35n5upn476f8MHi8DSvfVUFmJ213o5SYYrW8SeagGzmsN3W5acXR4QcvySjxRYT/GzH4/+H0dUAOmqV/mejawKeWySUgvy3c73SQyrzV01InmigdeKux19VmLCnsNGOVQDU/wGGAstRJV1KBvX7V6Yz0wzcEJh6bdPeuzvEuOxgR6FxX21939UwMrSQ4NMoCDvolleOltWDj8Ufes71Veuw5VEKfPXllJBrBsN55s1ZSOU+HntetQBVFh14UuCRp0zZ32HLoCWzxRy2X1hY05EneKLhzyqItlor5ppt3inbRd8cBLasqnSF8SURC9NPWjLpaJalE8eFm6l9G2o2m1dCnsOZH24pqxhbU5x29Xe4dH45v3aRiqmaa/CkhhjzDI1W39DtJ1WkCzfvUpkfuv2rBlXj98/epT5m1ftWELD3zg1Jb7x2mCa8FMdhT2CP0GsHkEvnHiGKoZ+4P7QCd14ijKt6zed/FbWp5YIH9lLRuFPUVRI/BJXeSiK9ckLoW9wJIMelTzunkhTON1UQth2i2eaczFN79XzdTnT5vCXlBxg95uMG1sYW1OPz6qv91Lsztq8Uy77Zp2S5fCXkDd1Oj7Zz3VxTzhWj4tWmKbDIW9YNoFvV0zPG2taumkaYltMhT2FCU5ddepNm9XwykQ0qCwpyipC1yqOuIebr6HT1pqvvdGYc9Yu8Uwja9ezkPQo0bqu90nzsq9BjXfk6WwZyxqMUwSQW+35HVsYS32MXqpRVXz5o/CXnKdlskW1aoNWzQv3yWFXQppELMAZaNTo0hFqGaX3OplYFDaU9gzNjEyrD/oNjTIlyyFPWN5mV6T8lP1kTEFXQZFYc+Qgi6DpLBnREGXQVPYM6CgSxYU9gFT0CUrCvsAKeiSJYV9QBR0yZrCPgAKuuSBwp4yBV3yQmFPkYIueaKwp0RBl7zR2viEKeSSV6rZE6SgS54p7AlR0CXvFPYEKOhSBAp7nxR0KQqFvQ8KuhSJRuN7oJBLEalmF6kIhV2kIhT2LqkJL0WlsHdBQZci0wBdDAq5lIFq9g4UdCkLhT2Cgi5lorC3oaBL2SjsLSjoUkYaoAtRyKXMzN2TP6jZ/wKvJn5gEenkBHc/ptUTqYRdRPJHfXaRilDYRSpCYS8hM7vIzNzM3ho8XmZmL/R4rFfMbEkXr7/SzO7o5b0kXQp7OV0OPA5clnVBJD8U9pIxs1HgHOD3aBF2M6uZ2W1m9ryZPWdmNwbb321mzwTbv2JmC0O73WhmTwfPNVoLR5nZt4Jj/MDMzhzE55PeKezl837gEXd/CdhmZsubnv8I8GbgbHc/E7jPzBYBfw1c6u5nUF9/cX1on63uvhy4E7g52HYr8ExwjD8C7knp80hCFPbyuRzYGPy+MXgc9hvAWnffD+Du24BTgZeDEwTA3cB5oX0eDH4+BSwLfj8XuDc4xmPA0Wa2OLmPIUnTCroSMbOjgXcBp5uZAzXAgS+GXxZso2lblJng5yyH/mZa7aNFGzmmmr1cLgHucfcT3H2Zuy8FXgaOD73mUeA6MxuCet8b2AIsM7OTg9d8EPhuh/f6HnBFcIwV1Jv6u5L6IJI8hb1cLge+2bTtAep96oYvA/8FPGdmzwJr3P3nwFXA35jZ88ABYG2H9/ok8HYzew74U+DD/Rdf0qTlsiIVoZpdpCIUdpGKUNhFKkJhF6kIhV2kIhR2kYpQ2EUqQmEXqYj/BxTpz8T4yrpPAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the decision boundary. For that, we will assign a color to each\n", "# point in the mesh [x_min, x_max]x[y_min, y_max].\n", "h = .02 # step size in the mesh\n", "x_min, x_max = X_red[:, 0].min() - .5, X_red[:, 0].max() + .5\n", "y_min, y_max = X_red[:, 1].min() - .5, X_red[:, 1].max() + .5\n", "xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n", "\n", "Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])\n", "\n", "# Put the result into a color plot\n", "Z = Z.reshape(xx.shape)\n", "\n", "plt.figure(1, figsize=(4, 3))\n", "plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)\n", "\n", "# Plot also the training points\n", "plt.scatter(X_red_training[:, 0], X_red_training[:, 1], c=Y_training, edgecolors='k', cmap=plt.cm.Paired)\n", "plt.xlabel(feature_name0)\n", "plt.ylabel(feature_name1)\n", "\n", "plt.xlim(xx.min(), xx.max())\n", "plt.ylim(yy.min(), yy.max())\n", "plt.xticks(())\n", "plt.yticks(())\n", "plt.title('Training set')\n", "\n", "plt.show()\n", "\n", "# Put the result into a color plot\n", "Z = Z.reshape(xx.shape)\n", "plt.figure(1, figsize=(4, 3))\n", "plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)\n", "\n", "# Plot also the test points \n", "plt.scatter(X_red_test[:, 0], X_red_test[:, 1], c=Y_test, edgecolors='k', cmap=plt.cm.Paired, marker='s')\n", "plt.xlabel(feature_name0)\n", "plt.ylabel(feature_name1)\n", "\n", "plt.xlim(xx.min(), xx.max())\n", "plt.ylim(yy.min(), yy.max())\n", "plt.xticks(())\n", "plt.yticks(())\n", "plt.title('Test set')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }