Skip to content

Commit

Permalink
add new use case, update blog (#2721)
Browse files Browse the repository at this point in the history
  • Loading branch information
colegottdank authored Oct 4, 2024
1 parent 20eb858 commit 8781c06
Show file tree
Hide file tree
Showing 4 changed files with 318 additions and 26 deletions.
8 changes: 4 additions & 4 deletions bifrost/app/blog/blogs/replaying-llm-sessions/metadata.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"title": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"title1": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"title2": "Replaying LLM Sessions for Iterative AI Agent Improvement",
"description": "Learn how to enhance your AI agents by replaying and modifying LLM sessions using Helicone. Apply changes directly to real user interactions to gain authentic context, reveal hidden effects, and accelerate iteration.",
"title": "Optimizing AI Agents: How Replaying LLM Sessions Enhances Performance",
"title1": "Optimizing AI Agents: How Replaying LLM Sessions Enhances Performance",
"title2": "Optimizing AI Agents: How Replaying LLM Sessions Enhances Performance",
"description": "Learn how to optimize your AI agents by replaying LLM sessions using Helicone. Enhance performance, uncover hidden issues, and accelerate AI agent development with this comprehensive guide.",
"images": "/static/blog/replaying-llm-sessions/sessions.webp",
"time": "15 minute read",
"author": "Cole Gottdank",
Expand Down
51 changes: 30 additions & 21 deletions bifrost/app/blog/blogs/replaying-llm-sessions/src.mdx
Original file line number Diff line number Diff line change
@@ -1,35 +1,42 @@
![sessions](/static/blog/replaying-llm-sessions/sessions.webp)
Experimenting with prompts in isolation **limits your understanding**. To truly grasp how a prompt change impacts an entire session, you need to **apply changes directly to real user interactions**. **<span style={{color: '#0ea5e9'}}>Replaying LLM sessions with Helicone unlocks this capability</span>**, providing insights unattainable through isolated testing.
![Optimizing AI Agents](/static/blog/replaying-llm-sessions/sessions.webp)

**Why is this powerful?**
Are you looking to **<span style={{color: '#0ea5e9'}}>optimize your AI agents</span>** and enhance their performance? Understanding how changes impact your AI agents in real-world interactions is crucial. By **<span style={{color: '#0ea5e9'}}>replaying LLM sessions</span>** with Helicone, you can directly apply modifications to actual AI agent sessions, providing valuable insights that traditional isolated testing may miss.

- **Authentic Context**: By leveraging actual production data, you see how changes affect real user experiences.
- **Unveiling Hidden Effects**: Discover unintended consequences that only emerge over full sessions.
- **Accelerated Iteration**: Automate testing with real inputs, streamlining your optimization process.
**Why Replay LLM Sessions for AI Agents?**

**<span style={{color: '#0ea5e9'}}>Helicone empowers you to replay any complex session</span>**—a capability no other platform offers. Due to our adaptability, more mature product teams often build bespoke solutions atop Helicone to store, aggregate, and analyze their AI workflows, enhancing performance with genuine user data without reinventing the wheel.
- **Deep Insights into Agent Behavior**: See how your AI agents perform in authentic scenarios using production data.
- **Uncover Hidden Issues**: Identify and address problems that only arise during full session interactions.
- **Accelerate Development**: Streamline your AI agent development process by testing changes efficiently.

In this guide, we'll **<span style={{color: '#0ea5e9'}}>demonstrate how to leverage Helicone to replay LLM sessions</span>**. You'll learn how to set up an initial session, query session data, and replay sessions with modifications. We'll also share tips on customizing this approach for your unique needs.
In this guide, we'll show you **<span style={{color: '#0ea5e9'}}>how to optimize your AI agents by replaying LLM sessions with Helicone</span>**, providing step-by-step instructions and best practices.

---

## Overview of the Replay Process With Helicone
## What is an AI Agent?

The process of replaying LLM sessions with Helicone involves three main steps:
An **<span style={{color: '#0ea5e9'}}>AI agent</span>** is an autonomous software entity that performs tasks on behalf of users with some degree of independence or autonomy, utilizing AI techniques. Optimizing these agents ensures they provide accurate, efficient, and reliable outcomes.

1. **<span style={{color: '#0ea5e9'}}>Setting Up the Initial Session</span>**: Instrument your LLM calls to include Helicone session metadata so that they can be tracked and logged.
2. **<span style={{color: '#0ea5e9'}}>Querying Helicone for Session Data</span>**: Use Helicone's API to retrieve the logs of past sessions that you want to replay.
3. **<span style={{color: '#0ea5e9'}}>Replaying the Session with Modifications</span>**: Programmatically modify the retrieved session data as needed and send requests to the LLM to observe the effects.
---

## Why Optimize AI Agents by Replaying LLM Sessions?

Let's explore each of these steps in detail by following an example.
Replaying LLM sessions allows you to:

## Example: AI Debate Application
- **Test Modifications Safely**: Experiment with changes without affecting live users.
- **Understand Contextual Performance**: See how adjustments impact the agent's behavior over entire sessions.
- **Improve User Experience**: Deliver more accurate and helpful interactions to users.

---

## Step-by-Step Guide to Enhancing AI Agent Performance

### Example Application: AI Debate

We'll walk through an example of a debate session between a user and an assistant. Between each argument, a impartial assistant scores the argument from 1 to 10.

### Step 1: Setting Up the Initial Session
### Step 1: Setting Up Your AI Agent with Helicone

Before you can replay sessions, you need to log them properly in Helicone. By adding **<span style={{color: '#0ea5e9'}}>only 3 headers</span>** to your LLM API requests, you can tag and group them into sessions.
Instrument your AI agent’s LLM calls to include Helicone session metadata for tracking and logging.

#### Instrumenting Your LLM Calls

Expand Down Expand Up @@ -146,7 +153,9 @@ _Go fullscreen for the best experience._

_Read more about how to implement Helicone sessions [here](https://docs.helicone.ai/features/sessions)._

### Step 2: Querying the Session Data from Helicone
### Step 2: Retrieving Session Data

Use Helicone's API to fetch session data for analysis.
```javascript
const response = await fetch("https://api.helicone.ai/v1/request/query", {
Expand All @@ -170,9 +179,9 @@ const data = await response.json();
Read more about Helicone's API [here](https://docs.helicone.ai/rest/request/post-v1requestquery).

### Step 3: Processing and Modifying the Session Data
### Step 3: Replaying and Modifying Sessions

Now that you have the session data, you'll need to process it.
Modify session data to test improvements.

1. **Parse and sort the requests**

Expand Down Expand Up @@ -294,7 +303,7 @@ _Alternatively, as described above, you can manually modify the prompts after re
### Conclusion
By replaying and modifying LLM sessions with Helicone, you gain deeper insights into how changes affect the entire workflow. This method provides context-rich, real-world data that leads to more effective optimizations and a comprehensive understanding of your AI's behavior.
By focusing on **replaying LLM sessions**, you can significantly **enhance the performance of your AI agents**. Helicone provides the tools necessary to make this process efficient and effective, leading to better user experiences and more robust AI applications.
---
Expand Down
3 changes: 2 additions & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,8 @@
"use-cases/experiments",
"use-cases/enable-stream-usage",
"use-cases/resell-a-model",
"use-cases/bill-by-usage"
"use-cases/bill-by-usage",
"use-cases/replay-session"
]
},
{
Expand Down
282 changes: 282 additions & 0 deletions docs/use-cases/replay-session.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
---
title: "Replaying LLM Sessions"
sidebarTitle: "Replay Sessions"
description: "Learn how to replay and modify LLM sessions using Helicone to optimize your AI agents and improve their performance."
"twitter:title": "Replaying LLM Sessions - Helicone OSS LLM Observability"
---

import QuestionsSection from "/snippets/questions-section.mdx";

Understanding how changes impact your AI agents in real-world interactions is crucial. By **replaying LLM sessions** with Helicone, you can apply modifications to actual AI agent sessions, providing valuable insights that traditional isolated testing may miss.

## Use Cases

- **Optimize AI Agents**: Enhance agent performance by testing modifications on real session data.
- **Debug Complex Interactions**: Identify issues that only arise during full session interactions.
- **Accelerate Development**: Streamline your AI agent development process by efficiently testing changes.

<Steps>
<Step title="Record Sessions with Helicone Metadata">

Instrument your AI agent’s LLM calls to include Helicone session metadata for tracking and logging.

**Example: Setting Up Session Metadata**

````javascript Setting Up Session Metadata
const { Configuration, OpenAIApi } = require("openai");
const { randomUUID } = require("crypto");

// Generate unique session identifiers
const sessionId = randomUUID();
const sessionName = "AI Debate";
const sessionPath = "/debate/climate-change";

// Initialize OpenAI client with Helicone baseURL and auth header
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
basePath: "https://oai.helicone.ai/v1",
baseOptions: {
headers: {
"Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
},
},
});
const openai = new OpenAIApi(configuration);
````

**Include the Helicone session headers in your requests:**

````javascript Including Helicone Session Headers
const completionParams = {
model: "gpt-3.5-turbo",
messages: conversation,
};

const response = await openai.createChatCompletion(completionParams, {
headers: {
"Helicone-Session-Id": sessionId,
"Helicone-Session-Name": sessionName,
"Helicone-Session-Path": sessionPath,
"Helicone-Prompt-Id": "assistant-response",
},
});
````

**Initialize the conversation with the assistant:**

````javascript Initializing Conversation
const topic = "The impact of climate change on global economies";

const conversation = [
{
role: "system",
content:
"You're an AI debate assistant. Engage with the user by presenting arguments for or against the topic. Keep responses concise and insightful.",
},
{
role: "assistant",
content: `Welcome to our debate! Today's topic is: "${topic}". I will argue in favor, and you will argue against. Please present your opening argument.`,
},
];
````

**Loop through the debate turns:**

````javascript Looping Through Debate Turns
const MAX_TURNS = 3;
let turn = 1;

while (turn <= MAX_TURNS) {
// Get user's argument (simulate user input)
const userArgument = await getUserArgument();
conversation.push({ role: "user", content: userArgument });

// Assistant responds with a counter-argument
const assistantResponse = await generateAssistantResponse(
conversation,
sessionId,
sessionName,
sessionPath
);
conversation.push(assistantResponse);

turn++;
}

// Function to simulate user input
async function getUserArgument() {
// Simulate user input or fetch from an input source
const userArguments = [
"I believe climate change is a natural cycle and not significantly influenced by human activities.",
"Economic resources should focus on immediate human needs rather than combating climate change.",
"Strict environmental regulations can hinder economic growth and affect employment rates.",
];
// Return the next argument
return userArguments.shift();
}

// Function to generate assistant's response
async function generateAssistantResponse(
conversation,
sessionId,
sessionName,
sessionPath
) {
const completionParams = {
model: "gpt-3.5-turbo",
messages: conversation,
};

const response = await openai.createChatCompletion(completionParams, {
headers: {
"Helicone-Session-Id": sessionId,
"Helicone-Session-Name": sessionName,
"Helicone-Session-Path": sessionPath,
"Helicone-Prompt-Id": "assistant-response",
},
});

const assistantMessage = response.data.choices[0].message;
return assistantMessage;
}
````

**After setting up and running your session through Helicone, you can view it in Helicone:**

<Frame>
<video width="100%" controls>
<source
src="https://marketing-assets-helicone.s3.us-west-2.amazonaws.com/session_debate.mp4"
type="video/mp4"
/>
Your browser does not support the video tag.
</video>
</Frame>

*Go fullscreen for the best experience.*

</Step>

<Step title="Retrieve Session Data">

Use Helicone's [Request API](/rest/request/post-v1requestquery) to fetch session data.
**Example: Querying Session Data**
````bash Querying Session Data
curl --request POST \
--url https://api.helicone.ai/v1/request/query \
--header 'Content-Type: application/json' \
--header 'authorization: Bearer sk-<your-helicone-api-key>' \
--data '{
"limit": 100,
"offset": 0,
"sort_by": {
"key": "request_created_at",
"direction": "asc"
},
"filter": {
"properties": {
"Helicone-Session-Id": {
"equals": "<session-id>"
}
}
}
}'
````
</Step>
<Step title="Modify and Replay the Session">
Retrieve the original requests, apply modifications, and resend them to observe the impact.
**Example: Modifying Requests and Replaying**
````javascript Modifying Requests and Replaying
const fetch = require("node-fetch");
const { randomUUID } = require("crypto");
const HELICONE_API_KEY = process.env.HELICONE_API_KEY;
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const REPLAY_SESSION_ID = randomUUID();
async function replaySession(requests) {
for (const request of requests) {
const modifiedRequest = modifyRequestBody(request);
await sendRequest(modifiedRequest);
}
}
function modifyRequestBody(request) {
// Implement modifications to the request body as needed
// For example, enhancing the system prompt for better responses
if (request.prompt_id === "assistant-response") {
const systemMessage = request.body.messages.find(
(msg) => msg.role === "system"
);
if (systemMessage) {
systemMessage.content +=
" Take the persona of a field expert and provide more persuasive arguments.";
}
}
return request;
}
async function sendRequest(modifiedRequest) {
const { body, request_path, path, prompt_id } = modifiedRequest;
const response = await fetch(request_path, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${OPENAI_API_KEY}`,
"Helicone-Auth": `Bearer ${HELICONE_API_KEY}`,
"Helicone-Session-Id": REPLAY_SESSION_ID,
"Helicone-Session-Name": "Replayed Session",
"Helicone-Session-Path": path,
"Helicone-Prompt-Id": prompt_id,
},
body: JSON.stringify(body),
});
const data = await response.json();
// Handle the response as needed
}
````
**Note:** In the `modifyRequestBody` function, we're enhancing the assistant's system prompt to make the responses more persuasive by taking the persona of a field expert.
</Step>
<Step title="Analyze the Replayed Session">
After replaying, use Helicone's dashboard to compare the original and modified sessions to evaluate improvements.

<Frame>
<video width="100%" controls>
<source
src="https://marketing-assets-helicone.s3.us-west-2.amazonaws.com/session_debate_replay.mp4"
type="video/mp4"
/>
Your browser does not support the video tag.
</video>
</Frame>

*Go fullscreen for the best experience.*

</Step>
</Steps>

## Additional Tips

- **Version Control Prompts**: Keep track of different prompt versions to see which yields the best results.
- **Use Evaluations**: Utilize Helicone's [Evaluation Features](/features/evaluation) to score and compare responses.
- **Prompt Versioning**: Use Helicone's [Prompt Versioning](/features/prompts) to manage and compare different prompt versions effectively.

## Conclusion

By replaying LLM sessions with Helicone, you can effectively **optimize your AI agents**, leading to improved performance and better user experiences.

<QuestionsSection />

0 comments on commit 8781c06

Please sign in to comment.