Replies: 1 comment
-
Hi @jackbravo interesting thought. However, this gets into a lot of domains in terms of how best to replace these, and I would say you could do this as a preprocessing step, and provide the resulting markdown document to MarkdownSplitter to be chunked |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It would be useful to remove images and/or links, preserving alt/titles/descriptions in text. URLs can have little value when generating embedding vectors and retrieving top results.
While working on markdown datasets, I've found documents that are full of images that if they don't have a description, add very little value to the content.
Hehe, perhaps a better idea would be a Markdown pre-processor that fills empty image descriptions with useful content 🤷♂️ :-p.
Beta Was this translation helpful? Give feedback.
All reactions