Why we shouldn't initialize the weights as zero

Why we shouldn't initialize the weights as zero

by JUAN BAEZA RUIZ-HENESTROSA -
Number of replies: 0

Let's see that if the Perceptron algorithm is initialized with the null vector, then the coefficient η does not affect learning.

Let's say that we take the null vector as w initialization.

On the first iteration of the Perceptron algorithm we will take a pair (x_1,t_1) from the trainig set. We can assume that the output (o_1) of it is not t_1. Then we will make:

w ←  η(t_1 − o_1)x_1

After some iteratons of the algorithm we will have a weight vector of the type:

w = η(t_1 − o_1)x_1 + η(t_2 − o_2)x_2 + ... + η(t_r − o_r)x_r 

Which can be rewritten as:

w = η[(t_1 − o_1)x_1 + (t_2 − o_2)x_2 + ... + (t_r − o_r)x_r] 

And here we can see that the learning rate (η) doesn't affects learning because it is only changes the size of the weights vector (not the direction), what does not makes any difference when calculating the output for a data x (since η>0):

o = sign( x . w ) =  sign( ηx . [(t_1 − o_1)x_1 + (t_2 − o_2)x_2 + ... + (t_r − o_r)x_r] = 

                           = sign( x . [(t_1 − o_1)x_1 + (t_2 − o_2)x_2 + ... + (t_r − o_r)x_r]

And here we can see clearly that it does not depend on η.