Hi,
I am currently working on the second assignment and I am facing some difficulties with building the FAISS db for the docs retrieval part.
Specifically the problem lies in the building time of the index. I am currently working with a data collection with the following statistics (after applying recursiveCharacterTextSplitter):
Total number of words: 116472266
Number of documents: 466443
From what I have seen online, it is not even a big dataset (<500k) but it takes so much time to build. Currently, due to timeout errors in Colab I had to switch to the DEI's cluster in order to have enough runtime and after 25hrs it is still building the index.
I am using faiss-cpu (faiss-gpu didn't show any significant improvement) and invoking it quite literally as done here.
Am I doing something wrong? Perhaps the dataset is too large? What am I missing?
I get that it is surely about how much data I am feeding FAISS with, but still from what I've read online I was not expecting it would take more than a day to compute. If anyone could help me I would really appreciate it!