{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Character Classification with Neural Networks\n",
    "\n",
    "In this notebook we are going to use the Neural Networks for image classification. We are going to use the same dataset of the lab on SVM: Kuzushiji-MNIST or K-MNIST for short (https://github.com/rois-codh/kmnist) a dataset of traditional japanese handwritten kana.\n",
    "\n",
    "The dataset labels are the following:\n",
    "\n",
    "| Label | Hiragana Character | Romanji (Pronunciation) |\n",
    "| :-: | :-: | :-: |\n",
    "|   0   | お | o |\n",
    "| 1 | き | ki |\n",
    "| 2 | す | su |\n",
    "| 3 | つ | tsu |\n",
    "| 4 | な | na |\n",
    "| 5 | は | ha |\n",
    "| 6 | ま | ma |\n",
    "| 7 | や | ya |\n",
    "| 8 | れ | re |\n",
    "| 9 | を | wo |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#load the required packages and check Scikit-learn version\n",
    "\n",
    "%matplotlib inline  \n",
    "\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import sklearn\n",
    "print ('scikit-learn version: ', sklearn.__version__)\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.svm import SVC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# helper function to load KMNIST dataset from disk\n",
    "def load_mnist(path, kind='train'):\n",
    "    import os\n",
    "    import gzip\n",
    "    import numpy as np\n",
    "    labels_path = os.path.join(path, 'K%s-labels-idx1-ubyte.gz' % kind)\n",
    "    images_path = os.path.join(path, 'K%s-images-idx3-ubyte.gz' % kind)\n",
    "    with gzip.open(labels_path, 'rb') as lbpath:\n",
    "        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,offset=8)\n",
    "    with gzip.open(images_path, 'rb') as imgpath:\n",
    "        images = np.frombuffer(imgpath.read(), dtype=np.uint8,offset=16).reshape(len(labels), 784)\n",
    "    return images, labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TODO \n",
    "Set as seed for the random generator your Student ID (you can use your \"numero di matricola\"). Try to change the seed to see the impact of the randomization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ID = 1234 #Your_ID\n",
    "np.random.seed(ID)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#load the MNIST dataset and let's normalize the features so that each value is in [0,1]\n",
    "X, y = load_mnist(\"data\")\n",
    "print(\"Number of samples in the K-MNIST dataset:\", X.shape[0])\n",
    "# rescale the data\n",
    "X = X / 255.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now split into training and test. We start with a small training set of 600 samples to reduce computation time while 4000 samples will be used for testing. Make sure that each label is present at least 10 times in train and test set frequencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#random permute the data and split into training and test taking the first 600\n",
    "#data samples as training and 4000 as test set\n",
    "permutation = np.random.permutation(X.shape[0])\n",
    "\n",
    "X = X[permutation]\n",
    "y = y[permutation]\n",
    "\n",
    "m_training = 600\n",
    "m_test = 4000\n",
    "\n",
    "X_train, X_test = X[:m_training], X[m_training:m_training+m_test]\n",
    "y_train, y_test = y[:m_training], y[m_training:m_training+m_test]\n",
    "\n",
    "labels, freqs = np.unique(y_train, return_counts=True)\n",
    "print(\"Labels in training dataset: \", labels)\n",
    "print(\"Frequencies in training dataset: \", freqs)\n",
    "\n",
    "labelsT, freqsT = np.unique(y_test, return_counts=True)\n",
    "print(\"Labels in test set: \", labels)\n",
    "print(\"Frequencies in test set: \", freqs)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#function for plotting a image and printing the corresponding label\n",
    "def plot_input(X_matrix, labels, index):\n",
    "    print(\"INPUT:\")\n",
    "    plt.imshow(\n",
    "        X_matrix[index].reshape(28,28),\n",
    "        cmap          = plt.cm.gray_r,\n",
    "        interpolation = \"nearest\"\n",
    "    )\n",
    "    plt.show()\n",
    "    print(\"LABEL: %i\"%labels[index])\n",
    "    return"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#let's try the plotting function\n",
    "plot_input(X_train,y_train,10)\n",
    "plot_input(X_test,y_test,100)\n",
    "plot_input(X_test,y_test,1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 1\n",
    "\n",
    "Now use a feed-forward Neural Network for prediction. Use the multi-layer perceptron classifier, with the following parameters: max_iter=100, alpha=1e-4, solver='sgd', tol=1e-4, learning_rate_init=.1, random_state=ID (this last parameter ensures the run is the same even if you run it more than once). The alpha parameter is the regularization term.\n",
    "\n",
    "Then, using the default activation function, pick four or five architectures to consider, with different numbers of hidden layers and different sizes. It is not necessary to create huge neural networks, you can limit to 3 layers and, for each layer, its maximum size can be of 50. Evaluate the architectures you chose using GridSearchCV with cv=5.\n",
    "\n",
    "You can reduce the number of iterations if the running time is too long on your computer.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# these are sample values but feel free to change them as you like, try to experiment with different sizes!!\n",
    "parameters = {'hidden_layer_sizes': [(10,), (20,), (40,), (20,20,), (40,20,10) ]}\n",
    "\n",
    "mlp = MLPClassifier(max_iter=100, alpha=1e-4, solver='sgd',\n",
    "                    tol=1e-4, random_state=ID,\n",
    "                    learning_rate_init=.1)\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print ('RESULTS FOR NN\\n')\n",
    "\n",
    "print(\"Best parameters set found:\")\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print(\"Score with best parameters:\")\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print(\"\\nAll scores on the grid:\")\n",
    "#ADD YOUR CODE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 2\n",
    "\n",
    "Now try also different batch sizes, while keeping the best NN architecture you have found above. Remember that the batch size was previously set to the default value, i.e., min(200, n_samples). \n",
    "Recall that a batch size of 1 corresponds to baseline SGD, while using all the 480 training samples (there are 600 samples but in cross validation with 5 folders we use 1/5 of them for validation at each round) corresponds to standard GD and using a different mini-batch size lies in the middle between the two extreme cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# these are sample values corresponding to baseline SGD, a reasonable mini-batch size and standard GD\n",
    "# again feel free to change them as you like, try to experiment with different batch sizes!!\n",
    "parameters = {'batch_size': [1, 32, 480]}\n",
    "\n",
    "# need to specify that you would like to use the standard k-fold split otherwise sklearn create splits of different sizes\n",
    "kf = sklearn.model_selection.KFold(n_splits=5)\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "# recall to use cv=kf to use the k-fold subdivision seen in the lectures\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "\n",
    "print ('RESULTS FOR NN\\n')\n",
    "\n",
    "print(\"Best parameters set found:\")\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print(\"Score with best parameters:\")\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print(\"\\nAll scores on the grid:\")\n",
    "#ADD YOUR CODE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### QUESTION 1\n",
    "\n",
    "What do you observe for different architectures and batch sizes? How do the number of layers and their sizes affect the performances? What do you observe for different batch sizes, in particular what happens to the training convergence for different batch sizes (notice that the algorithm could not converge for some batch sizes)?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [ANSWER TO QUESTION 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TODO 3:\n",
    "\n",
    "Plot the train and test accuracies as a function of the number of learnable parameters in your neural network. Print also the computation time for the various configurations you try (the code for getting the computation time is already provided). You can use 100 iterations (if you get a warning on convergence not reached it is not an issue for this lab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from functools import reduce\n",
    "\n",
    "# Function to compute the number of learnable parameters of a mlp given the size of its hidden layers\n",
    "def param_count(hl_size):\n",
    "    tot = 0\n",
    "    input_size, output_size = X_train.shape[1], len(labels)\n",
    "    tot += (input_size+1)*hl_size[0]\n",
    "    for i in range(1,len(hl_size)):\n",
    "        tot += (hl_size[i-1]+1)*hl_size[i]\n",
    "    tot += (hl_size[-1]+1)*output_size\n",
    "    return tot\n",
    "\n",
    "hl_sizes = [(10,), (20,), (40,), (20,20,), (40,20,10)]\n",
    "hl_labels = [param_count(t) for t in hl_sizes]\n",
    "\n",
    "ti = time.time()\n",
    "train_acc_list, test_acc_list = [], []\n",
    "for hl_size in hl_sizes:\n",
    "    print('Training MLP of size {} ...'.format(hl_size))\n",
    "    mlp = #ADD YOUR CODE\n",
    "    \n",
    "    #ADD YOUR CODE\n",
    "    \n",
    "    train_acc_list.append(mlp.score(X_train, y_train))\n",
    "    test_acc_list.append(mlp.score(X_test, y_test))\n",
    "    print('Done, training time: {:.2f} sec\\n'.format(time.time()-ti))\n",
    "    ti = time.time()\n",
    "\n",
    "fig, ax = plt.subplots(1,2, figsize=(15,5))\n",
    "\n",
    "\n",
    "ax[0].plot(train_acc_list)\n",
    "ax[0].set_xlabel('Number of learnable params')\n",
    "ax[0].set_title('Train accuracy')\n",
    "ax[0].set_xticks(np.arange(0,len(hl_labels)))\n",
    "ax[0].set_xticklabels(hl_labels)\n",
    "ax[0].grid(True)\n",
    "\n",
    "ax[1].plot(test_acc_list)\n",
    "ax[1].set_xlabel('Number of learnable params')\n",
    "ax[1].set_title('Test accuracy')\n",
    "ax[1].set_xticks(np.arange(0,len(hl_labels)))\n",
    "ax[1].set_xticklabels(hl_labels)\n",
    "ax[1].grid(True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2:\n",
    "\n",
    "Comment about the training and test accuracies referring to the discussion on underfitting and overfitting we did in the course"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [ANSWER TO QUESTION 2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 4\n",
    "\n",
    "Now try also to use different learning rates, while keeping the best NN architecture and batch size you have found above. Plot the learning curves (i.e., the variation of the loss over the steps, you can get it from the loss_curve_ object of sklearn) for the different values of the learning rate. Try to run each training for 100 iterations. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import operator\n",
    "\n",
    "lr_list = [0.0002, 0.002, 0.02, 0.2]\n",
    "scores = {}\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "\n",
    "print ('RESULTS FOR NN\\n')\n",
    "\n",
    "print(\"Best parameters set found:\")\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print(\"Score with best parameters:\")\n",
    "#ADD YOUR CODE\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### QUESTION 3\n",
    "\n",
    "Comment about the learning curves (i.e. the variation of the loss over the steps). How does the curve changes for different learning rates in terms of stability and speed of convergence ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [ANSWER TO QUESTION 3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 5\n",
    "\n",
    "Now get training and test error for a NN with best parameters (architecture, batch size and learning rate) from above. Plot the learning curve also for this case (you can run the training for 500 iterations)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#get training and test error for the best NN model from CV\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print ('\\nRESULTS FOR BEST NN\\n')\n",
    "\n",
    "print (\"Best NN training error: %f\" % training_error)\n",
    "print (\"Best NN test error: %f\" % test_error)\n",
    "\n",
    "#ADD YOUR CODE FOR PLOTTING"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More data \n",
    "Now let's do the same but using 4000 (or less if it takes too long on your machine) data points for training. Use the same NN architecture as before, but you can try more if you like and have a powerful computer!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = X[permutation]\n",
    "y = y[permutation]\n",
    "\n",
    "m_training = 4000\n",
    "\n",
    "X_train, X_test = X[:m_training], X[m_training:]\n",
    "y_train, y_test = y[:m_training], y[m_training:]\n",
    "\n",
    "labels, freqs = np.unique(y_train, return_counts=True)\n",
    "print(\"Labels in training dataset: \", labels)\n",
    "print(\"Frequencies in training dataset: \", freqs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 6\n",
    "\n",
    "Now train the NNs with the added data points using the optimum parameters found above. Eventually, feel free to try different architectures if you like. We suggest that you use 'verbose=True' so have an idea of how long it takes to run 1 iteration (eventually reduce also the number of iterations to 50)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use best architecture and params from before\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print ('\\nRESULTS FOR NN\\n')\n",
    "\n",
    "#get training and test error for the NN\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print (\"NN training error: %f\" % training_error)\n",
    "print (\"NN test error: %f\" % test_error)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QUESTION 4\n",
    "Compare the train and test error you got with a large number of samples with the best one you obtained with only 600 data points. Comment about the results you obtained."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### [ANSWER TO QUESTION 4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 7\n",
    "\n",
    "Plot an example that was missclassified by NN with m=600 training data points and it is now instead correctly classified by NN with m=4000 training data points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NN_prediction = #ADD YOUR CODE\n",
    "large_NN_prediction = #ADD YOUR CODE\n",
    "\n",
    "#ADD YOUR CODE\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 8\n",
    "\n",
    "Let's plot the weigths of the multi-layer perceptron classifier, for the best NN we get with 600 data points and with 4000 data points. The code is already provided, just fix variable names (e.g., replace mlp , mlp_large with your estimators) in order to have it working with your implementation\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Weights with 600 data points:\")\n",
    "\n",
    "fig, axes = plt.subplots(4, 4)\n",
    "vmin, vmax = mlp.coefs_[0].min(), mlp.coefs_[0].max()\n",
    "for coef, ax in zip(mlp.coefs_[0].T, axes.ravel()):\n",
    "    ax.matshow(coef.reshape(28, 28), cmap=plt.cm.gray, vmin=.5 * vmin,\n",
    "               vmax=.5 * vmax)\n",
    "    ax.set_xticks(())\n",
    "    ax.set_yticks(())\n",
    "\n",
    "plt.show()\n",
    "\n",
    "print(\"Weights with 4000 data points:\")\n",
    "\n",
    "fig, axes = plt.subplots(4, 4)\n",
    "vmin, vmax = mlp_large.coefs_[0].min(), mlp_large.coefs_[0].max()\n",
    "for coef, ax in zip(mlp_large.coefs_[0].T, axes.ravel()):\n",
    "    ax.matshow(coef.reshape(28, 28), cmap=plt.cm.gray, vmin=.5 * vmin,\n",
    "               vmax=.5 * vmax)\n",
    "    ax.set_xticks(())\n",
    "    ax.set_yticks(())\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QUESTION 5\n",
    "\n",
    "Describe what do you observe by looking at the weights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### [ANSWER TO QUESTION 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO 9\n",
    "\n",
    "Take the best SVM model and its parameters, you found in the last notebook. Fit it on a few data points and compute its training and test scores. Then fit also a logistic regression model with C=1. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m_training = 5000\n",
    "\n",
    "X_train, X_test = X[:m_training], X[m_training:2*m_training]\n",
    "y_train, y_test = y[:m_training], y[m_training:2*m_training]\n",
    "\n",
    "# use best parameters found in the SVM notebook, create SVM and perform fitting\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print ('RESULTS FOR SVM')\n",
    "\n",
    "SVM_training_error =  #ADD YOUR CODE\n",
    "\n",
    "print(\"Training score SVM:\")\n",
    "print(SVM_training_error)\n",
    "\n",
    "SVM_test_error = #ADD YOUR CODE\n",
    "print(\"Test score SVM:\")\n",
    "print(SVM_test_error)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import linear_model\n",
    "\n",
    "regL2 = #ADD YOUR CODE\n",
    "\n",
    "# you can re-use your code from Lab 2\n",
    "\n",
    "#ADD YOUR CODE\n",
    "\n",
    "print ('\\nRESULTS FOR LOGISTIC REGRESSION WITH REGULARIZATION')\n",
    "\n",
    "training_error =  #ADD YOUR CODE\n",
    "test_error =  #ADD YOUR CODE\n",
    "\n",
    "print (\"Training error (reg): %f\" % training_error)\n",
    "print (\"Test error (reg): %f\" % test_error)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QUESTION 6\n",
    "Compare the results of Logistic Regression, SVM and NN. Which one achieve the best results? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### [ANSWER TO QUESTION 6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QUESTION 7\n",
    "\n",
    "What are the different ways in which you can improve the results obtained for NN? List and justify some of them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### [ANSWER TO QUESTION 7]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}