Internship+Thesis proposals

Internship+Thesis proposals

by Francesco Rinaldi -
Number of replies: 0

Dear Students,


below some internship+thesis proposals @ Unipd.


Cheers,

     Francesco Rinaldi



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Word embeddings are vectorial representation of words with with the goal of preserving semantic similarity. They are the state-of-the-art representation for machine learning algorithms applied to natural language. They are the result on specific learning algorithms trained on usually large corpora.  Consequently, they inherit all biases of the corpora they have been trained on. The goal of the project is to devise an efficient algorithm to compare two different word embeddings in order to automatically highlight the biases they were subject to. Specifically, we look for an alignment between the two vector spaces, corresponding to the two word embeddings, that minimises the difference between the stable words, i.e. the ones that have not changed in the two embeddings, thus highlighting the differences between the ones that have changed. We do have a first version of the algorithm, but we need further study to improve it or devise a better version.

We envision the project would require the following:

1. improving the modelling and implementation of the aligment algorithm on a single cpu machine;
2. investigating whether an online version of the algorithm is possible;
3. an efficient implementation to run the alignment algorithm over multiple cores and or machines;
4. implement an online version of the investigating whether an online version of the algorithm is possible;
5. a series of analyses on different pairs of corpora to highlight different kind of biases and possibly automatically identify them.

The project could be split into multiple theses, for example one thesis for 1-3 and another for 3-5.

A good knowledge of optmisation and coding is required for 1-2; very good programming skills are required for 3-4;

The theses will be under the supervision of prof. Rinaldi and Da San Martino, respectively.

For further info write an email to rinaldi@math.unipd.it and/or giovanni.dasanmartino@unipd.it