Hallucinations in the tutorial #8247

feinmann · 2024-08-18T15:32:36Z

feinmann
Aug 18, 2024

Hi there,

I was working through the tutorials and particularly enjoyed the 30_File_Type_Preprocessing_Index_Pipeline.ipynb. However, I noticed that the resulting list of ingredients had some inaccuracies. Specifically, the LLM mixed up the measurements. It incorrectly suggested the following:

- 10 oz firm or extra firm tofu
- 1/2 cup spinach

But the correct measurements should be:

- 14 oz firm or extra firm tofu
- 10 oz spinach

I understand that this tutorial is intended to provide an overview of the general concept and utilizes a specific embedder along with a relatively small LLM. That said, this is a fairly simple task.

My question is:

What strategies can be employed to prevent such errors? Could there be some form of double-checking implemented?

Answered by julian-risch

Aug 26, 2024

Hello @feinmann and thank you for providing feedback on the tutorial! We could add a note to the tutorial about hallucinations @bilgeyucel @TuanaCelik
If I understand your example correctly, the retriever retrieved the correct documents and provided them in the prompt to the LLM but then the LLM hallucinated. A first countermeasure is a better model. As you already pointed out HuggingFaceH4/zephyr-7b-beta is a relatively small model and was chosen here for that reason. A second countermeasure I assume could help is to reduce the number of retrieved documents. For that you just need to change one line:

# replace
pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=docu…

View full answer

julian-risch · 2024-08-26T16:52:47Z

julian-risch
Aug 26, 2024
Maintainer

Hello @feinmann and thank you for providing feedback on the tutorial! We could add a note to the tutorial about hallucinations @bilgeyucel @TuanaCelik
If I understand your example correctly, the retriever retrieved the correct documents and provided them in the prompt to the LLM but then the LLM hallucinated. A first countermeasure is a better model. As you already pointed out HuggingFaceH4/zephyr-7b-beta is a relatively small model and was chosen here for that reason. A second countermeasure I assume could help is to reduce the number of retrieved documents. For that you just need to change one line:

# replace
pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
# with 
pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=3))

By default, 10 documents are added to the prompt but that many documents might be confusing for the LLM. If the top 3 and not relevant, the next step would be to improve the retriever or add an additional reranker.
A third option is to add some form of double-checking. That would be possible. Could be an additional component that checks whether the generated answer can be really inferred from the retrieved documents. You can have a look at our FaithfulnessEvaluator component. It's closely related https://docs.haystack.deepset.ai/docs/faithfulnessevaluator
Something simpler to try out is to adjust the prompt and tell the model in the instructions that it is important to always have the correct measurements. 😉

1 reply

feinmann Aug 27, 2024
Author

Thank you very much for the detailed response! This really helps! I am looking into the suggested add-ons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hallucinations in the tutorial #8247

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Hallucinations in the tutorial #8247

feinmann Aug 18, 2024

Replies: 1 comment · 1 reply

julian-risch Aug 26, 2024 Maintainer

feinmann Aug 27, 2024 Author

feinmann
Aug 18, 2024

Replies: 1 comment 1 reply

julian-risch
Aug 26, 2024
Maintainer

feinmann Aug 27, 2024
Author