{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we are going to see some fundamentals of Python and of the related packages (Numpy, Scikit-learn, etc.) " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This will be executed\n" ] } ], "source": [ "# a comment starts with an hashtag symbol:\n", "print(\"This will be executed\")\n", "#print(\"This will not\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numbers" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "x = 1 # notice that we didn't need to declare the type, it was automatically inferred\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python 3 the division always returns a float" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.5\n" ] } ], "source": [ "x = 3.0\n", "y = 2. # the 0 can be omitted: 2.0 == 2.\n", "print(x/y)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "value of x: 4 type of x: \n", "value of z: 2.0 type of z: \n" ] } ], "source": [ "x = 4 # int\n", "y = 2 # int\n", "z = x/y\n", "print('value of x:', x, 'type of x:', type(x))\n", "print('value of z:', z, 'type of z:', type(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \" // \" command performs integer division (in general notice the different behaviour of division in Python 2 and Python 3)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "value of z: 1 type of z: \n", "value of z: 1.0 type of z: \n" ] } ], "source": [ "x = 3\n", "y = 2\n", "z = x // y\n", "print('value of z:', z, ' type of z: ', type(z))\n", "\n", "x = 3.0\n", "y = 2.0\n", "z = x // y\n", "print('value of z:', z, 'type of z:', type(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The list is ordered and elements can be added and removed" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[20, 2, -6]\n", "-6\n" ] } ], "source": [ "a = [20, 2, -6]\n", "print(a)\n", "print(a[2]) # notice that indices start from 0 (differently from Matlab)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "will\n" ] } ], "source": [ "b = ['I','will','pass','ML']\n", "print(b[1])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "[]\n", "['I', 'will', 'pass', 'ML']\n", "['I', 'will', 'pass']\n", "\n" ] } ], "source": [ "b = []\n", "print(b)\n", "b = list()\n", "print(b)\n", "b.append('I')\n", "b.append('will')\n", "b.append('pass')\n", "b.append('ML')\n", "print(b)\n", "b.remove('ML')\n", "print(b)\n", "print(type(b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "range(start,end) creates an array with all the numbers from (start) to (end-1)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "range(1, 5)\n", "\n", "[1, 2, 3, 4]\n" ] } ], "source": [ "c = range(1,5) # same as for(int i=1; i<5; i++) in java/c\n", "print(c)\n", "print(type(c))\n", "print(list(c))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\":\" indexing similar to Matlab (but notice that ending index value is not included and array indexing starts from 0)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "d = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "d[0:10] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "d[:] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "d[1:10] = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "d[0:9] = [0, 1, 2, 3, 4, 5, 6, 7, 8] <- last element is (end-1) !\n", "d[3:] = [3, 4, 5, 6, 7, 8, 9]\n", "d[:7] = [0, 1, 2, 3, 4, 5, 6]\n", "d[:-2] = [0, 1, 2, 3, 4, 5, 6, 7]\n", "d[2] = 2\n", "d[2:3] = [2]\n" ] } ], "source": [ "d = [0,1,2,3,4,5,6,7,8,9]\n", "print('d =', d)\n", "print('d[0:10] =', d[0:10])\n", "print('d[:] =', d[:])\n", "print('d[1:10] =', d[1:10])\n", "print('d[0:9] =', d[0:9], ' <- last element is (end-1) !')\n", "print('d[3:] =', d[3:])\n", "print('d[:7] =', d[:7])\n", "print('d[:-2] =', d[:-2]) # shorthand for d[:len(d)-2]\n", "print('d[2] =', d[2]) # notice that this is a number\n", "print('d[2:3] =', d[2:3]) # while this is a list of 1 element" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# If-then-else" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If-then-else have a quite standard behavior. Notice how the indentation is used in place of the parenthesis of other programming languages\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x > 2\n", "The value of x is: 3\n", "The value of x is: 3\n" ] } ], "source": [ "x = 3\n", "if x > 2:\n", " print('x > 2')\n", "else:\n", " print('x <= 2')\n", "print(\"The value of x is: \" + str(x)) # + concatenate strings\n", "print(\"The value of x is:\" , x) # print automatically adds a space between elements" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "value of c: Good Morning !!!\n" ] } ], "source": [ "c = 'Good Morning !!!'\n", "if c != \"hello\":\n", " print(\"value of c: \" + c)\n", "else:\n", " print(\"c has value hello\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x is between 0 and 10 (excluded)\n" ] } ], "source": [ "x = 9.9\n", "if (x > 0.0) and (x <10.0):\n", " print(\"x is between 0 and 10 (excluded)\")\n", "else:\n", " print(\"x <= 0 or x >= 10\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Need to study for ML!!!\n" ] } ], "source": [ "grade = 27.2\n", "course = \"Computer Vision\"\n", "if (grade > 26) and (not (course != \"ML\")):\n", " print(\"Everything is good\")\n", "else:\n", " print(\"Need to study for ML!!!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Iterating over elements" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I\n", "will\n", "pass\n", "ML\n" ] } ], "source": [ "x = list(['I','will','pass','ML'])\n", "for elem in x:\n", " print(elem)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of elements: 4\n", "I\n", "will\n", "pass\n", "ML\n" ] } ], "source": [ "x = list(['I','will','pass','ML'])\n", "print('Number of elements: ' , len(x))\n", "for i in range(len(x)):\n", " print(x[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Importing packages" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import scipy as sp\n", "import numpy as np\n", "import sklearn as sl\n", "#import sklearnex as sl # optimized version for intel processors (MUCH faster)\n", " # can be installed via 'conda install mkl sklearn -c intel'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Arrays in numpy" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a:\n", "[1. 2. 3.]\n", "a.shape = (3,)\n", "\n", "b:\n", "[[1 2 3]\n", " [4 5 6]]\n", "b.shape = (2, 3)\n", "\n", "c:\n", "[[0. 0.]\n", " [0. 0.]\n", " [0. 0.]]\n", "c.shape (3, 2)\n", "\n", "c1:\n", "[[1. 2. 3.]]\n", "c1.shape = (1, 3) <- notice the difference with the first example (a) !!\n", "\n", "d:\n", "[[1. 1. 1.]\n", " [1. 1. 1.]]\n", "\n", "e:\n", "[[1. 0. 0. 0.]\n", " [0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [0. 0. 0. 1.]]\n", "\n", "f:\n", "[[0.2978403 0.86118454 0.97853982 0.90239901]\n", " [0.61399409 0.96707241 0.566517 0.3994777 ]]\n", "\n" ] } ], "source": [ "#create vector given values in it\n", "a = np.array([1.0, 2.0, 3.0])\n", "print('a:')\n", "print(a)\n", "print('a.shape =', a.shape, end='\\n\\n')\n", "\n", "#create matrix given values in it\n", "b = np.array([[1, 2, 3], [4, 5, 6]])\n", "print('b:')\n", "print(b)\n", "print('b.shape =', b.shape, end='\\n\\n')\n", "\n", "#create matrix of 0's of given size\n", "c = np.zeros((3,2))\n", "print('c:')\n", "print(c)\n", "print('c.shape', c.shape, end='\\n\\n')\n", "\n", "#create matrix of size 1x3 (different from array of 3 elements!)\n", "c1 = np.zeros((1,3))\n", "c1[0,:] = [1, 2, 3]\n", "print('c1:')\n", "print(c1)\n", "print('c1.shape =', c1.shape, ' <- notice the difference with the first example (a) !!', end='\\n\\n')\n", "\n", "#create matrix of 1's of given size\n", "d = np.ones((2,3))\n", "print('d:')\n", "print(d, end='\\n\\n')\n", "\n", "#create identity matrix of given size\n", "e = np.eye(4)\n", "print('e:')\n", "print(e, end='\\n\\n')\n", "\n", "#create random matrix (values in [0,1]) of given size\n", "f = np.random.random((2,4))\n", "print('f:')\n", "print(f, end='\\n\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Array indexing in numpy" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 5 7 9]\n", " [ 2 6 8 10]]\n" ] } ], "source": [ "e = np.array([[1,5,7,9],[2,6,8,10]]);\n", "print(e[:,:])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 5 7 9]\n", " [ 2 6 8 10]]\n" ] } ], "source": [ "print(e[:,0:4])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 6]\n" ] } ], "source": [ "print(e[:,1])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[5 7]\n", " [6 8]]\n" ] } ], "source": [ "print(e[:,1:3])" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(e[:,1:1])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[5 7]\n", " [6 8]]\n" ] } ], "source": [ "print(e[:,1:-1])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 7 9]\n" ] } ], "source": [ "print(e[0,1:])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 5 7]\n" ] } ], "source": [ "print(e[0,0:3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading from CSV file [1/2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_name = \"data/marks.csv\"\n", "infile = open(file_name,'r')\n", "line_c = 0\n", "for line in infile:\n", " if line_c <=4: # read only the first 4 lines\n", " line = line.strip() #strip removes whitespaces and newlines characters\n", " print(\"Line:\\n\"+line) # newline removed ^\n", " v = line.split(',') # split breaks up the string in chunks delimited by the argument\n", " print(\"List: \"+str(v))\n", " print(\"Elements in list:\")\n", " for i in range(len(v)):\n", " print(v[i].strip(), end=' ') #strip removes whitespaces\n", " print('\\n')\n", " line_c += 1\n", "infile.close() # remember to close the file when not used anymore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Automatic file.close() – 'with' environment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes one does not need the complexity allowed by f = open(filename) ... f.close().\\\n", "In those cases we can use the **with** environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_name = \"data/marks.csv\"\n", "line_c = 0\n", "with open(file_name,'r') as infile: # infile = open(file_name,'r')\n", " for line in infile:\n", " if line_c <=4: # read only the first 4 lines\n", " line = line.strip() #strip removes whitespaces and newlines characters\n", " print(\"Line:\\n\"+line) # newline removed ^\n", " v = line.split(',') # split breaks up the string in chunks delimited by the argument\n", " print(\"List: \"+str(v))\n", " print(\"Elements in list:\")\n", " for i in range(len(v)):\n", " print(v[i].strip(), end=' ') #strip removes whitespaces\n", " print('\\n')\n", " line_c += 1\n", "# infile.close() executed automatically when we exit the indented block" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Writing to file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infile_name = \"data/marks.csv\"\n", "infile = open(file_name,'r')\n", "line_c = 0\n", "outfile_name = \"tmp.txt\"\n", "outfile = open(outfile_name,'w')\n", "for line in infile:\n", " if line_c <=4:\n", " outfile.write(\"Line:\\n\"+line+\"\\n\")\n", " v = line.split(',')\n", " outfile.write(\"List: \"+str(v)+\"\\n\")\n", " outfile.write(\"Elements in list:\\n\")\n", " for i in range(len(v)):\n", " outfile.write(v[i].strip()+\"\\n\")\n", " line_c += 1\n", "infile.close()\n", "outfile.write(str(10.))\n", "outfile.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading from CSV file [2/2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "\n", "filename = \"data/marks.csv\"\n", "\n", "lines = csv.reader(open(filename, newline=''), delimiter=',')\n", "print('type(lines) = ', type(lines))\n", "\n", "for line in lines:\n", " print(line)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "\n", "filename = \"data/marks.csv\"\n", "lines = csv.reader(open(filename, newline=''), delimiter=',')\n", "\n", "dataset = list(lines)\n", "for i in range(len(dataset)):\n", " dataset[i] = [float(x) for x in dataset[i]]\n", "print(dataset)\n", "print('Number of students:', len(dataset), end='\\n\\n')\n", "# you can convert lists to numpy for automatic print formatting:\n", "print(np.array(dataset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def print_value(x):\n", " print(str(x))\n", "\n", "print_value(10)\n", "print_value('hello')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sign(value):\n", " if value > 0:\n", " return 1.\n", " elif value < 0: # this means elseif, allows to construct checks with multiple cases without annidation\n", " return -1.\n", " else:\n", " return 0.\n", "\n", "print(sign(10.2))\n", "print(sign(-0.6))\n", "print(sign(0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment\n", "\n", "1) Load the provided .csv file with the used car data\n", "\n", "2) Use a linear regression to estimate the car prices from the year, kilometers or engine power. You can make a simple 1D regression from each one of the parameters independently (as an optional task you can also try a 2D or 3D regression combining multiple cues)\n", "\n", "3) Firstly perform the estimation using the scipy linregress function (or alternatively you can use the sklearn.linear_model.LinearRegression class).
\n", "NB: check the documentation of the two methods!! In particular be aware of the number of outputs (in case use \"_\" to avoid the return of a specific output).\n", "\n", "4) Have a look at the correlation coefficient to see which of the 3 features works better\n", "\n", "5) Then implement the least square algorithm: you should get exactly the same solution of linregress !\n", "\n", "6) Plot the data and the lines representing the output of the linregress and least square algorithms" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }