Exercise lecture 12 - Encoding range-type variables

Exercise lecture 12 - Encoding range-type variables

di JUAN BAEZA RUIZ-HENESTROSA -
Numero di risposte: 0

In this discussion I will try to give an answer and a justification to one on the questions proposed on the Lecture 12:

"Think about how range-type variables might be represented. Is it reasonable to represent them as any numeric variables or is it better to have ad-hoc encoding?"

Let's say we are working with a numerical variable which takes values in a range (for example the number of days per year that rains in a city, the amount of a certain nutrient in a food per 100g of product or we could also consider the height of a person if we set a max and a min height for a person). In this case, if we work with the "raw" data, the value of an instance doesn't give us much information about how is this value in relation to the other values we can find in this feature (Is it small or big in relation to the feature we are considering?). For that reason, if we are working with range-type variables, it might be much more interesting in terms of interpretation, to scale the values in the range of this particular feature:

h(x_ij) = (x_ij−x_min,j)/ (x_max,j−x_min,j)

Doing this, if we see the value of an instance on the feature, we can say inmediately how big is this value in relation to the values this feature can take, making it much more easier to interpret our data.


If my answer if not correct or someone has a different approach, I would really like to know about it.