Project part II: retrieval-augmented generation

Project part II: retrieval-augmented generation

by Giorgio Satta -
Number of replies: 3

The second part of the project for this year will be on retrieval-augmented generation (RAG), a technique that exploits the power of neural methods for information retrieval combined with generative models to construct virtual assistants for domain specific applications. RAG was also briefly presented in slides 105-108 of the lecture on LLMs (06_large_language_models).

For the development of your project you can look at the example of a small-scale RAG system presented in the first part (stop at timelapse 37:27) of this video

The slides and the notebook of the presentation are also attached to this post.

You are required to go through the following steps.

  1. Choose a domain of interest for your assistant; this can be anything you like, for example culinary art, rock music, etc. Find a large enough textual dataset on the web for your domain, clean up the data, split long documents into chunks, and create your document repository.

  2. Choose a sentence embedding model for the documents in your repository and for the queries. Choose a library for setting up your vector store, which is basically a vector database and a retrieval model based on vector similarity for ranking documents on the basis of the relevance to the input query and for the selection of the N-best documents.

  3. Create a template that takes as input the query text and the N-best documents, and provides as output a suitable prompt. Choose a LLM that has been instructed as a chatBot. The model is then used for inference on your prompt, to get the answer to your query.

For step 3 you can work with a 7B or a 13B parameter model. For 13B models, you need to use special libraries for model quantization. These libraries reduce the size of LLMs by modifying the precision of their weights. This is needed to fit in the amount of RAM usually available in Google Colab. A short video about parameter quantization will soon be posted in the course homepage.

Use the LangChain library to combine all of your sources of computation or knowledge. A short video on how to use the LangChain library will soon be posted in the course hpage.

Your project should be presented as a notebook, to be uploaded in the Google shared folder, see the 'Project registration' post in this forum. Use a file name that includes as prefix your proper names. The notebook should be organized as follows.

  • Student names, badge number and master program should be reported at the forefront of the notebook.

  • Domain and dataset should be described in a special section. Dataset profiling (summary of your dataset through descriptive statistics) is also required.

  • Basic libraries for sentence embedding, vector store, LLM quantization, etc. should all be introduced and briefly documented.

  • The prompt engineering step should be carefully described in a special section. Design choices should be discussed, including those that have been discarded because of low quality results.

  • For this task we do human-based qualitative evaluation. This means that you should test your system on a few questions, analyse possible errors and inaccuracies, and discuss possible ways of improving your system.

  • If you use solutions that have been inspired by similar projects publicly available in the web, you should add a proper acknowledgement section in your notebook.

Finally, sometime later you will also be asked to upload in the course moodle page a code-only version of your project as a .py file. This will be used to detect plagiarism. Detailed instructions will be provided.

If you have any question about the second part of the project, please post a message in this thread.

In reply to Giorgio Satta

Re: Project part II: retrieval-augmented generation

by Giorgio Satta -

Dear All, for the second part of the project you may consider using the LangChain library to combine all of your sources of computation or knowledge, but this is not mandatory. The open lab session III is a short introduction to LangChain.

If you work with 13B LLM (chat version) you need to use special libraries for model quantization. In practice, this is explained in the RAG project video that I have recommended, at time-lapse 27:00--28:00.

In reply to Giorgio Satta

Re: Project part II: retrieval-augmented generation

by ANDREA GHIOTTO -
Good morning,
I would like to ask, to be sure I understood correctly, if the description of our project (domain, dataset, vector store, models, tests, etc.) needs to be done in the notebook or if we are required to write a separated report.
Thank you and have a good day!