Skip to content

Commit

Permalink
refine intent decomposition prompt (#185)
Browse files Browse the repository at this point in the history
  • Loading branch information
IANTHEREAL authored Jul 24, 2024
1 parent e861176 commit ab08cbe
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 24 deletions.
27 changes: 13 additions & 14 deletions backend/app/rag/knowledge_graph/intent.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,19 @@ class DecomposedFactors(BaseModel):
class DecomposeQuery(dspy.Signature):
"""You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:
1. Extract Meaningful user intents and questions:
- Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns/questions/problems/use cases, etc.
- Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.
2. Establish Relationships:
- Ensure that the source entity and target entities in each relationship are present in the list of extracted entities.
- Carefully examine the text to identify all relationships between clearly-related entities, ensuring each relationship is correctly captured with accurate details about the interactions.
- Clearly define the relationships, ensuring accurate directionality that reflects the logical or functional dependencies among entities. \
This means identifying which entity is the source, which is the target, and what the nature of their relationship is (e.g., $source_entity depends on $target_entity for $relationship).
## Instructions:
- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.
- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.
"""
1. Extract Meaningful user intents and questions:
- Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns/questions/problems/use cases, etc.
- Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.
2. Establish Relationships to describe the user's intents:
- Define relationships that accurately represent the user's query intent and information needs.
- Format each relationship as: (Source Entity) - [Relationship] -> (Target Entity), where the relationship describes what the user wants to know about the connection between these entities.
## Instructions:
- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.
- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.
- Ensure that the relationships and intents are grounded and factual, based on the information provided in the query.
"""

query: str = dspy.InputField(
desc="The query text to extract the most critical concepts and their relationships from the query."
Expand Down
25 changes: 15 additions & 10 deletions backend/dspy_compiled_program/decompose_query_program
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,13 @@
"train": [],
"demos": [
{
"augmented": true,
"query": "Chat2query is returning an error message saying \"Query timeout expired\". Additionally, I am unable to locate this SQL query in the slow query log.",
"factors": "{\"relationships\":[{\"source_entity\":\"Chat2query\",\"target_entity\":\"Error Message\",\"relationship_desc\":\"Chat2query is returning an error message saying 'Query timeout expired'.\",\"reasoning\":\"The main problem the user is facing.\"},{\"source_entity\":\"SQL Query\",\"target_entity\":\"Slow Query Log\",\"relationship_desc\":\"The reason why not to locate the SQL query in the slow query log.\",\"reasoning\":\"The secondary problem the user is facing.\"}]}"
"factors": "```json\n{\n \"relationships\": [\n {\n \"source_entity\": \"Chat2query\",\n \"target_entity\": \"Error Message\",\n \"relationship_desc\": \"Chat2query is returning an error message saying 'Query timeout expired'.\",\n \"reasoning\": \"The main problem the user is facing.\"\n },\n {\n \"source_entity\": \"SQL Query\",\n \"target_entity\": \"Slow Query Log\",\n \"relationship_desc\": \"The user is unable to locate the SQL query in the slow query log.\",\n \"reasoning\": \"The secondary issue the user is facing.\"\n }\n ]\n}\n```"
},
{
"query": "Hi, how do u setup tidb on debian vps?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Cluster\",\"target_entity\":\"Debian VPS\",\"relationship_desc\":\"How to deploy a TiDB Cluster on a Debian VPS? Should I use TiUP or TiDB Operator?\",\"reasoning\":\"The main question the user is asking.\"}]}"
},
{
"query": "We are new to TiDB and don't quite understand the potential impact on our application architecture. We are using TiDB for audit logs and continue to direct traffic to TiDB. We noticed a sudden jump ID from 1 to 30,001. Are there any impacts? Do we need to address this? If we have 100 connections from several applications, what will happen? In summary, what should we do for Auto Increment or do nothing?",
Expand All @@ -17,31 +22,31 @@
"factors": "{\"relationships\":[{\"source_entity\":\"App Containers\",\"target_entity\":\"TiDB Database\",\"relationship_desc\":\"How to solve the connection issue between the app containers and the TiDB database?\",\"reasoning\":\"The main problem the user is facing.\"},{\"source_entity\":\"Connectivity Issue\",\"target_entity\":\"Cluster Status\",\"relationship_desc\":\"The connectivity issue exists despite the cluster status showing 'available'.\",\"reasoning\":\"The discrepancy the user is concerned about.\"}]}"
},
{
"query": "Hi, how do u setup tidb on debian vps?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Cluster\",\"target_entity\":\"Debian VPS\",\"relationship_desc\":\"How to deploy a TiDB Cluster on a Debian VPS? Should I use TiUP or TiDB Operator?\",\"reasoning\":\"The main question the user is asking.\"}]}"
"query": "I am current using tidb serverless, but as my product grows, I really need a dalicated cluster. Is there a solution helps finish the migration?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Dedicated Cluster\",\"relationship_desc\":\"How to migrate from TiDB serverless to TiDB dedicated cluster?\",\"reasoning\":\"The main concern of the user.\"}]}"
},
{
"query": "I'm attempting to download a specific backup from the database hosted on TiDB Cloud. So far, I've tried accessing the backup through SQL queries, but haven't found a way to execute this operation.\n\nThe instructions provided suggested using SSH to transfer the backup, however, I don't have SSH access to the server where the backups are stored.\n\nI would like to request guidance on how I can proceed to download this backup without direct access to the server. Is there an alternative or different method that I can use to obtain the desired backup?\n\nThank you in advance for any assistance or guidance you can provide on this matter.",
"factors": "{\"relationships\":[{\"source_entity\":\"Backup data\",\"target_entity\":\"TiDB Cloud\",\"relationship_desc\":\"How to download a specific backup from TiDB Cloud?\",\"reasoning\":\"The main question the user is asking\"},{\"source_entity\":\"Backup SQL\",\"target_entity\":\"Backup data\",\"relationship_desc\":\"I can't find a way to execute Backup SQL queries to download the backup.\",\"reasoning\":\"The problem the user is facing\"}]}"
"query": "Please speak Chinese",
"factors": "{\"relationships\":[{\"source_entity\":\"User\",\"target_entity\":\"Language\",\"relationship_desc\":\"The user is requesting to communicate in Chinese.\",\"reasoning\":\"the main concern of the user\"}]}"
},
{
"query": "I am designing a table based on TiDB's TTL feature, but when I try to create the table using a cluster created with Serverless, I get a `'TTL' is not supported on TiDB Serverless` error.\n\nI plan to use Dedicated on my production environment and Serverless on my development environment, so it would be helpful if the TTL feature could be used in a Serverless environment.\n\nI've read the documentation that says Serverless will support TTL features in the future, but is there a specific timeline for this?\n\nAlso, is it possible to prevent TTL syntax from causing errors in Serverless?",
"factors": "{\"relationships\":[{\"source_entity\":\"TTL Feature\",\"target_entity\":\"TiDB Serverless\",\"relationship_desc\":\"The TTL feature is not currently supported in TiDB Serverless.\",\"reasoning\":\"The problem the user is facing.\"},{\"source_entity\":\"TTL Feature\",\"target_entity\":\"Roadmap Support Timeline\",\"relationship_desc\":\"What's the roadmap timeline on when the TTL feature will be supported in TiDB Serverless.\",\"reasoning\":\"The main question the user is asking.\"},{\"source_entity\":\"TTL SQL Syntax\",\"target_entity\":\"Workaround for SQL Syntax Error\",\"relationship_desc\":\"Workaround to prevent TTL feature SQL syntax from causing errors in TiDB Serverless.\",\"reasoning\":\"The secondary question the user is asking.\"}]}"
"query": "I'm attempting to download a specific backup from the database hosted on TiDB Cloud. So far, I've tried accessing the backup through SQL queries, but haven't found a way to execute this operation.\n\nThe instructions provided suggested using SSH to transfer the backup, however, I don't have SSH access to the server where the backups are stored.\n\nI would like to request guidance on how I can proceed to download this backup without direct access to the server. Is there an alternative or different method that I can use to obtain the desired backup?\n\nThank you in advance for any assistance or guidance you can provide on this matter.",
"factors": "{\"relationships\":[{\"source_entity\":\"Backup data\",\"target_entity\":\"TiDB Cloud\",\"relationship_desc\":\"How to download a specific backup from TiDB Cloud?\",\"reasoning\":\"The main question the user is asking\"},{\"source_entity\":\"Backup SQL\",\"target_entity\":\"Backup data\",\"relationship_desc\":\"I can't find a way to execute Backup SQL queries to download the backup.\",\"reasoning\":\"The problem the user is facing\"}]}"
},
{
"query": "Upgrade TiDB Serverless to 7.4 or latest for enhanced MySQL 8.0 compatibility",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB 7.4 or Latest version\",\"target_entity\":\"MySQL 8.0 Compatibility\",\"relationship_desc\":\"TiDB 7.4 or the latest version enhances compatibility with MySQL 8.0\",\"reasoning\":\"The reasoning why user wants to upgrade TiDB Serverless to 7.4 or latest for enhanced MySQL 8.0 compatibility\"},{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Upgrade\",\"relationship_desc\":\"How to upgrade TiDB Serverless?\",\"reasoning\":\"The basic question what the user itentionally asked.\"}]}"
},
{
"query": "I am current using tidb serverless, but as my product grows, I really need a dalicated cluster. Is there a solution helps finish the migration?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Dedicated Cluster\",\"relationship_desc\":\"How to migrate from TiDB serverless to TiDB dedicated cluster?\",\"reasoning\":\"The main concern of the user.\"}]}"
"query": "I am designing a table based on TiDB's TTL feature, but when I try to create the table using a cluster created with Serverless, I get a `'TTL' is not supported on TiDB Serverless` error.\n\nI plan to use Dedicated on my production environment and Serverless on my development environment, so it would be helpful if the TTL feature could be used in a Serverless environment.\n\nI've read the documentation that says Serverless will support TTL features in the future, but is there a specific timeline for this?\n\nAlso, is it possible to prevent TTL syntax from causing errors in Serverless?",
"factors": "{\"relationships\":[{\"source_entity\":\"TTL Feature\",\"target_entity\":\"TiDB Serverless\",\"relationship_desc\":\"The TTL feature is not currently supported in TiDB Serverless.\",\"reasoning\":\"The problem the user is facing.\"},{\"source_entity\":\"TTL Feature\",\"target_entity\":\"Roadmap Support Timeline\",\"relationship_desc\":\"What's the roadmap timeline on when the TTL feature will be supported in TiDB Serverless.\",\"reasoning\":\"The main question the user is asking.\"},{\"source_entity\":\"TTL SQL Syntax\",\"target_entity\":\"Workaround for SQL Syntax Error\",\"relationship_desc\":\"Workaround to prevent TTL feature SQL syntax from causing errors in TiDB Serverless.\",\"reasoning\":\"The secondary question the user is asking.\"}]}"
},
{
"query": "tidb lighting to sync to serverless cluster,but the load command and the tidb-lighting tools dont have the tls config like --ssl-ca or --ca. so i can not sync to the full back data to the serverless",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Lighting\",\"target_entity\":\"Serverless Cluster\",\"relationship_desc\":\"Sync data to a serverless cluster using TiDB Lighting.\",\"reasoning\":\"The user case what the user wants to achieve\"},{\"source_entity\":\"Load Command and TiDB Lighting Tools\",\"target_entity\":\"TLS Configuration\",\"relationship_desc\":\"How to configure TLS for TiDB Lightning?\",\"reasoning\":\"The basic question what the user itentionally asked.\"},{\"source_entity\":\"Lack of TLS Configuration\",\"target_entity\":\"Sync Issue\",\"relationship_desc\":\"The sync issue is caused by the lack of TLS configuration options for TiDB Lightning.\",\"reasoning\":\"The problem that the user is facing.\"}]}"
}
],
"signature_instructions": "You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:\n\n1. Extract Meaningful user intents and questions:\n - Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns\/questions\/problems\/use cases, etc.\n - Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.\n2. Establish Relationships:\n - Ensure that the source entity and target entities in each relationship are present in the list of extracted entities.\n - Carefully examine the text to identify all relationships between clearly-related entities, ensuring each relationship is correctly captured with accurate details about the interactions.\n - Clearly define the relationships, ensuring accurate directionality that reflects the logical or functional dependencies among entities. This means identifying which entity is the source, which is the target, and what the nature of their relationship is (e.g., $source_entity depends on $target_entity for $relationship).\n\n## Instructions:\n\n- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.\n- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.",
"signature_instructions": "You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:\n\n1. Extract Meaningful user intents and questions:\n - Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns\/questions\/problems\/use cases, etc.\n - Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.\n2. Establish Relationships to describe the user's intents:\n - Define relationships that accurately represent the user's query intent and information needs.\n - Format each relationship as: (Source Entity) - [Relationship] -> (Target Entity), where the relationship describes what the user wants to know about the connection between these entities.\n\n## Instructions:\n\n- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.\n- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.\n- Ensure that the relationships and intents are grounded and factual, based on the information provided in the query.",
"signature_prefix": "Factors:"
}
}

0 comments on commit ab08cbe

Please sign in to comment.