Chains, memory, and binary data #711

ndarilek · 2024-03-21T19:04:11Z

ndarilek
Mar 21, 2024

Hey folks, new to LLM development so hope I'm on the right path. I'm working on a small image describing service based on Ollama. I have the base case working for simply describing the image,but now I want to add the ability to ask followup questions and feel like I've hit a wall.

First, I wasn't clear whether chains or agents were the tool I needed here. It looks like agents can have memory but for this case might be overkill. Is that correct? I don't need any other tools for this, just image descriptions.

My service has a web frontend and I'm caching user questions/followup. So when the user asks a followup question, now I create a chain. I'm not sure how to add the image binary data either to the memory or to the request alongside the memory. Is this possible?

For reference, here's what I have so far. This uses PocketBase for persistence but it should be pretty obvious what's going on:

	app.OnRecordAfterUpdateRequest("images").Add(func(e *core.RecordUpdateEvent) error {
		record := e.Record
		ctx := context.Background()
		mem := memory.NewConversationBuffer()
		mem.ChatHistory.AddMessage(ctx, schema.SystemChatMessage{Content: PROMPT})
		followupIds := record.GetStringSlice("followups")
		for _, followupId := range followupIds {
			followup, err := app.Dao().FindRecordById("followups", followupId)
			if err != nil {
				return err
			}
			text := followup.GetString("text")
			if followup.GetBool("user") {
				mem.ChatHistory.AddUserMessage(ctx, text)
			} else {
				mem.ChatHistory.AddAIMessage(ctx, text)
			}
		}
// Need to get the image in here somehow.
		// chain := chains.NewConversation(llm, mem)
		// chain.Run
		return nil
	})

Thanks.

ndarilek · 2024-03-21T19:08:25Z

ndarilek
Mar 21, 2024
Author

Update: here's my entire project for a bit more context. https://github.com/superfly/llm-describer Not much content there since it's a demo I'm putting together for a blog post, but describer.go contains the code of interest.

0 replies

devalexandre · 2024-03-22T01:42:29Z

devalexandre
Mar 22, 2024

Hey folks, new to LLM development so hope I'm on the right path. I'm working on a small image describing service based on Ollama. I have the base case working for simply describing the image,but now I want to add the ability to ask followup questions and feel like I've hit a wall.

First, I wasn't clear whether chains or agents were the tool I needed here. It looks like agents can have memory but for this case might be overkill. Is that correct? I don't need any other tools for this, just image descriptions.

My service has a web frontend and I'm caching user questions/followup. So when the user asks a followup question, now I create a chain. I'm not sure how to add the image binary data either to the memory or to the request alongside the memory. Is this possible?

For reference, here's what I have so far. This uses PocketBase for persistence but it should be pretty obvious what's going on:
	app.OnRecordAfterUpdateRequest("images").Add(func(e *core.RecordUpdateEvent) error {
		record := e.Record
		ctx := context.Background()
		mem := memory.NewConversationBuffer()
		mem.ChatHistory.AddMessage(ctx, schema.SystemChatMessage{Content: PROMPT})
		followupIds := record.GetStringSlice("followups")
		for _, followupId := range followupIds {
			followup, err := app.Dao().FindRecordById("followups", followupId)
			if err != nil {
				return err
			}
			text := followup.GetString("text")
			if followup.GetBool("user") {
				mem.ChatHistory.AddUserMessage(ctx, text)
			} else {
				mem.ChatHistory.AddAIMessage(ctx, text)
			}
		}
// Need to get the image in here somehow.
		// chain := chains.NewConversation(llm, mem)
		// chain.Run
		return nil
	})
Thanks.

I believe that if you use agent with memory , you can solve your problem this way by being able to have the image description in context, or you can persist it somewhere and inject this description into the prompt using PromptTemplate

1 reply

ndarilek Mar 22, 2024
Author

Huh, but I'd need the image itself too, right? Because if I'm asking followup questions, the model may need to examine other qualities of the image that it missed in the first description. I'm already injecting the original description.

devalexandre · 2024-03-22T02:53:42Z

devalexandre
Mar 22, 2024

Huh, but I'd need the image itself too, right? Because if I'm asking followup questions, the model may need to examine other qualities of the image that it missed in the first description. I'm already injecting the original description.
So it's something to look into, because if you have a detailed description of the image, you shouldn't need to examine it again, because the idea of having the image in memory is not something I consider necessary.

4 replies

ndarilek Mar 22, 2024
Author

I'm genuinely confused. Say I get an image description that mentions trees, but doesn't go into detail. I then ask for details on those trees--what kind they are, whether they have leaves or not, etc. If all the model has is its own description, how can it possibly answer followup questions without hallucinating just for the sake of answering? Likewise with graphs or other technical data--it seems likely that the initial description might skip a detail I think is relevant, and it seems pretty limiting that chains have to count on that initial description being absolutely accurate and unambiguous.

For me this is kind of a blocker--I'm working on a project for describing images to blind people, sometimes we have to ask followup questions because the initial description isn't complete in the ways we need it to be, and it seems like an odd restriction that followup questions can only be based on the original description. Is this an actual limitation in the underlying tech? I've seen a few services that support followup questions so it seems like it shouldn't be.

Thanks.

devalexandre Mar 22, 2024

In this case, you can put the image in memory so it will have somewhere to consult it, but I'm wondering what the behavior would be for it to reanalyze

ndarilek Mar 26, 2024
Author

I guess what I'm wondering is whether or not there's a technical reason memory is limited to just strings rather than llms.MessageContent? That seems like it would solve my problem.

I'm wondering what the behavior would be for it to reanalyze

I'm sorry, genuinely not trying to be dense, but can you please help me understand what's being asked?

https://www.bemyeyes.com/blog/introducing-be-my-ai This is a link to an AI image describer similar to what I'm trying to build. Notably, it can answer followup questions on the image.

It seems to me like the model ought to be able to provide a simplistic description of a scene, or a more general description, then describe additional details based on what I ask it. Say for instance I give it this and get back this:

The image depicts a serene winter scene. There is a small stream of water flowing through the center, with patches of snow on its surface and along the banks. The surrounding area appears to be a mix of bare trees and shrubs, suggesting it's late in the season or possibly early spring when some plants have started to grow again but haven't yet leafed out.

The sky is clear and blue, indicating good weather conditions at the time the
photo was taken. The sunlight seems to be coming from the left side of the
image, casting a warm glow on the scene and creating reflections in the water.
There are no visible texts or distinguishing marks that provide additional
context about the location or the photographer.

The overall atmosphere is peaceful and natural, with the quiet beauty of the landscape undisturbed except for the gentle movement of the stream.

Say now I want to know what types of trees are in the image (E.g. pines, oaks, etc.) Plainly that detail isn't in the description, and asking based on the description alone would either result in an "I don't know" if I'm lucky, or hallucinations if I'm not. It seems to me light the model ought to be able to do some additional analysis on the trees in the image, any trees in other images that are tagged with their types, and make some guesses. But if all I can populate the memory with is the text, and not the MessageContent of the rest of the conversation, then I don't see how to implement this feature on any supported model, even if the model itself can answer followups.

I hope that clarifies what I'm looking for. Maybe I'll just package the conversation/image in another larger system prompt and treat it like the base case, though it feels cleaner to just use the memory. I need to get something shipped though, so think that's what I'll do for now.

devalexandre Apr 3, 2024

you can generate image in some API then , use this image to analyze with llama2 or another model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chains, memory, and binary data #711

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Chains, memory, and binary data #711

ndarilek Mar 21, 2024

Replies: 3 comments · 5 replies

ndarilek Mar 21, 2024 Author

devalexandre Mar 22, 2024

ndarilek Mar 22, 2024 Author

devalexandre Mar 22, 2024

ndarilek Mar 22, 2024 Author

devalexandre Mar 22, 2024

ndarilek Mar 26, 2024 Author

devalexandre Apr 3, 2024

ndarilek
Mar 21, 2024

Replies: 3 comments 5 replies

ndarilek
Mar 21, 2024
Author

devalexandre
Mar 22, 2024

ndarilek Mar 22, 2024
Author

devalexandre
Mar 22, 2024

ndarilek Mar 22, 2024
Author

ndarilek Mar 26, 2024
Author