Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/pingcap/tidb.ai into fronte…
Browse files Browse the repository at this point in the history
…nd/feat-llm-settings
  • Loading branch information
IANTHEREAL committed Jul 24, 2024
2 parents c55336a + 87bb970 commit 5426e5f
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 24 deletions.
7 changes: 6 additions & 1 deletion backend/app/rag/default_prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,13 @@
---------------------
Given a conversation (between Human and Assistant) and a follow-up message from the Human, use the context from the previous conversation to rewrite the follow-up message into a standalone, detailed question (Note: The language should be consistent with the follow up message from Human). Ensure the refined question captures all relevant context and is written in a way that maximizes the effectiveness of a vector search to retrieve precise and comprehensive information.
Given a conversation (between Human and Assistant) and a follow-up message from the Human, use the context from the previous conversation to rewrite the follow-up message into a standalone, detailed question (Note: The language should be consistent with the follow-up message from Human). Ensure the refined question captures all relevant context and is written in a way that maximizes the effectiveness of a vector search to retrieve precise and comprehensive information.
Key considerations:
1. Focus on the latest query from the Human, ensuring it is given the most weight.
2. Utilize knowledge graph and the history messages to provide relevant context and background information.
3. Ensure the refined question is suitable for vector search by emphasizing specific and relevant terms.
4. Ensure the refined question is grounded and factual, directly based on the user's follow-up question.
Example:
Expand Down
25 changes: 12 additions & 13 deletions backend/app/rag/knowledge_graph/intent.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,19 @@ class DecomposedFactors(BaseModel):
class DecomposeQuery(dspy.Signature):
"""You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:
1. Extract Meaningful user intents and questions:
- Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns/questions/problems/use cases, etc.
- Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.
2. Establish Relationships:
- Ensure that the source entity and target entities in each relationship are present in the list of extracted entities.
- Carefully examine the text to identify all relationships between clearly-related entities, ensuring each relationship is correctly captured with accurate details about the interactions.
- Clearly define the relationships, ensuring accurate directionality that reflects the logical or functional dependencies among entities. \
This means identifying which entity is the source, which is the target, and what the nature of their relationship is (e.g., $source_entity depends on $target_entity for $relationship).
1. Extract Meaningful user intents and questions:
- Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns/questions/problems/use cases, etc.
- Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.
2. Establish Relationships to describe the user's intents:
- Define relationships that accurately represent the user's query intent and information needs.
- Format each relationship as: (Source Entity) - [Relationship] -> (Target Entity), where the relationship describes what the user wants to know about the connection between these entities.
## Instructions:
## Instructions:
- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.
- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.
"""
- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.
- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.
- Ensure that the relationships and intents are grounded and factual, based on the information provided in the query.
"""

query: str = dspy.InputField(
desc="The query text to extract the most critical concepts and their relationships from the query."
Expand All @@ -57,7 +56,7 @@ class DecomposeQueryModule(dspy.Module):
def __init__(self, dspy_lm: dspy.LM):
super().__init__()
self.dspy_lm = dspy_lm
self.prog = TypedChainOfThought(DecomposeQuery)
self.prog = TypedPredictor(DecomposeQuery)

def forward(self, query):
with dspy.settings.context(lm=self.dspy_lm):
Expand Down
8 changes: 8 additions & 0 deletions backend/app/rag/llm_option.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ class LLMOption(BaseModel):
provider: LLMProvider
default_model: str
model_description: str
default_credentials: str | dict = ""
credentials_display_name: str
credentials_description: str
credentials_type: str = "str"
Expand All @@ -21,6 +22,7 @@ class LLMOption(BaseModel):
credentials_display_name="OpenAI API Key",
credentials_description="The API key of OpenAI, you can find it in https://platform.openai.com/api-keys",
credentials_type="str",
default_credentials="sk-****",
),
LLMOption(
provider=LLMProvider.GEMINI,
Expand All @@ -29,6 +31,7 @@ class LLMOption(BaseModel):
credentials_display_name="Google API Key",
credentials_description="The API key of Google AI Studio, you can find it in https://aistudio.google.com/app/apikey",
credentials_type="str",
default_credentials="AIza****",
),
LLMOption(
provider=LLMProvider.ANTHROPIC_VERTEX,
Expand All @@ -37,5 +40,10 @@ class LLMOption(BaseModel):
credentials_display_name="Google Credentials JSON",
credentials_description="The JSON Object of Google Credentials, refer to https://cloud.google.com/docs/authentication/provide-credentials-adc#on-prem",
credentials_type="dict",
default_credentials={
"type": "service_account",
"project_id": "****",
"private_key_id": "****",
},
),
]
25 changes: 15 additions & 10 deletions backend/dspy_compiled_program/decompose_query_program
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,13 @@
"train": [],
"demos": [
{
"augmented": true,
"query": "Chat2query is returning an error message saying \"Query timeout expired\". Additionally, I am unable to locate this SQL query in the slow query log.",
"factors": "{\"relationships\":[{\"source_entity\":\"Chat2query\",\"target_entity\":\"Error Message\",\"relationship_desc\":\"Chat2query is returning an error message saying 'Query timeout expired'.\",\"reasoning\":\"The main problem the user is facing.\"},{\"source_entity\":\"SQL Query\",\"target_entity\":\"Slow Query Log\",\"relationship_desc\":\"The reason why not to locate the SQL query in the slow query log.\",\"reasoning\":\"The secondary problem the user is facing.\"}]}"
"factors": "```json\n{\n \"relationships\": [\n {\n \"source_entity\": \"Chat2query\",\n \"target_entity\": \"Error Message\",\n \"relationship_desc\": \"Chat2query is returning an error message saying 'Query timeout expired'.\",\n \"reasoning\": \"The main problem the user is facing.\"\n },\n {\n \"source_entity\": \"SQL Query\",\n \"target_entity\": \"Slow Query Log\",\n \"relationship_desc\": \"The user is unable to locate the SQL query in the slow query log.\",\n \"reasoning\": \"The secondary issue the user is facing.\"\n }\n ]\n}\n```"
},
{
"query": "Hi, how do u setup tidb on debian vps?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Cluster\",\"target_entity\":\"Debian VPS\",\"relationship_desc\":\"How to deploy a TiDB Cluster on a Debian VPS? Should I use TiUP or TiDB Operator?\",\"reasoning\":\"The main question the user is asking.\"}]}"
},
{
"query": "We are new to TiDB and don't quite understand the potential impact on our application architecture. We are using TiDB for audit logs and continue to direct traffic to TiDB. We noticed a sudden jump ID from 1 to 30,001. Are there any impacts? Do we need to address this? If we have 100 connections from several applications, what will happen? In summary, what should we do for Auto Increment or do nothing?",
Expand All @@ -17,31 +22,31 @@
"factors": "{\"relationships\":[{\"source_entity\":\"App Containers\",\"target_entity\":\"TiDB Database\",\"relationship_desc\":\"How to solve the connection issue between the app containers and the TiDB database?\",\"reasoning\":\"The main problem the user is facing.\"},{\"source_entity\":\"Connectivity Issue\",\"target_entity\":\"Cluster Status\",\"relationship_desc\":\"The connectivity issue exists despite the cluster status showing 'available'.\",\"reasoning\":\"The discrepancy the user is concerned about.\"}]}"
},
{
"query": "Hi, how do u setup tidb on debian vps?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Cluster\",\"target_entity\":\"Debian VPS\",\"relationship_desc\":\"How to deploy a TiDB Cluster on a Debian VPS? Should I use TiUP or TiDB Operator?\",\"reasoning\":\"The main question the user is asking.\"}]}"
"query": "I am current using tidb serverless, but as my product grows, I really need a dalicated cluster. Is there a solution helps finish the migration?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Dedicated Cluster\",\"relationship_desc\":\"How to migrate from TiDB serverless to TiDB dedicated cluster?\",\"reasoning\":\"The main concern of the user.\"}]}"
},
{
"query": "I'm attempting to download a specific backup from the database hosted on TiDB Cloud. So far, I've tried accessing the backup through SQL queries, but haven't found a way to execute this operation.\n\nThe instructions provided suggested using SSH to transfer the backup, however, I don't have SSH access to the server where the backups are stored.\n\nI would like to request guidance on how I can proceed to download this backup without direct access to the server. Is there an alternative or different method that I can use to obtain the desired backup?\n\nThank you in advance for any assistance or guidance you can provide on this matter.",
"factors": "{\"relationships\":[{\"source_entity\":\"Backup data\",\"target_entity\":\"TiDB Cloud\",\"relationship_desc\":\"How to download a specific backup from TiDB Cloud?\",\"reasoning\":\"The main question the user is asking\"},{\"source_entity\":\"Backup SQL\",\"target_entity\":\"Backup data\",\"relationship_desc\":\"I can't find a way to execute Backup SQL queries to download the backup.\",\"reasoning\":\"The problem the user is facing\"}]}"
"query": "Please speak Chinese",
"factors": "{\"relationships\":[{\"source_entity\":\"User\",\"target_entity\":\"Language\",\"relationship_desc\":\"The user is requesting to communicate in Chinese.\",\"reasoning\":\"the main concern of the user\"}]}"
},
{
"query": "I am designing a table based on TiDB's TTL feature, but when I try to create the table using a cluster created with Serverless, I get a `'TTL' is not supported on TiDB Serverless` error.\n\nI plan to use Dedicated on my production environment and Serverless on my development environment, so it would be helpful if the TTL feature could be used in a Serverless environment.\n\nI've read the documentation that says Serverless will support TTL features in the future, but is there a specific timeline for this?\n\nAlso, is it possible to prevent TTL syntax from causing errors in Serverless?",
"factors": "{\"relationships\":[{\"source_entity\":\"TTL Feature\",\"target_entity\":\"TiDB Serverless\",\"relationship_desc\":\"The TTL feature is not currently supported in TiDB Serverless.\",\"reasoning\":\"The problem the user is facing.\"},{\"source_entity\":\"TTL Feature\",\"target_entity\":\"Roadmap Support Timeline\",\"relationship_desc\":\"What's the roadmap timeline on when the TTL feature will be supported in TiDB Serverless.\",\"reasoning\":\"The main question the user is asking.\"},{\"source_entity\":\"TTL SQL Syntax\",\"target_entity\":\"Workaround for SQL Syntax Error\",\"relationship_desc\":\"Workaround to prevent TTL feature SQL syntax from causing errors in TiDB Serverless.\",\"reasoning\":\"The secondary question the user is asking.\"}]}"
"query": "I'm attempting to download a specific backup from the database hosted on TiDB Cloud. So far, I've tried accessing the backup through SQL queries, but haven't found a way to execute this operation.\n\nThe instructions provided suggested using SSH to transfer the backup, however, I don't have SSH access to the server where the backups are stored.\n\nI would like to request guidance on how I can proceed to download this backup without direct access to the server. Is there an alternative or different method that I can use to obtain the desired backup?\n\nThank you in advance for any assistance or guidance you can provide on this matter.",
"factors": "{\"relationships\":[{\"source_entity\":\"Backup data\",\"target_entity\":\"TiDB Cloud\",\"relationship_desc\":\"How to download a specific backup from TiDB Cloud?\",\"reasoning\":\"The main question the user is asking\"},{\"source_entity\":\"Backup SQL\",\"target_entity\":\"Backup data\",\"relationship_desc\":\"I can't find a way to execute Backup SQL queries to download the backup.\",\"reasoning\":\"The problem the user is facing\"}]}"
},
{
"query": "Upgrade TiDB Serverless to 7.4 or latest for enhanced MySQL 8.0 compatibility",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB 7.4 or Latest version\",\"target_entity\":\"MySQL 8.0 Compatibility\",\"relationship_desc\":\"TiDB 7.4 or the latest version enhances compatibility with MySQL 8.0\",\"reasoning\":\"The reasoning why user wants to upgrade TiDB Serverless to 7.4 or latest for enhanced MySQL 8.0 compatibility\"},{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Upgrade\",\"relationship_desc\":\"How to upgrade TiDB Serverless?\",\"reasoning\":\"The basic question what the user itentionally asked.\"}]}"
},
{
"query": "I am current using tidb serverless, but as my product grows, I really need a dalicated cluster. Is there a solution helps finish the migration?",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Serverless\",\"target_entity\":\"Dedicated Cluster\",\"relationship_desc\":\"How to migrate from TiDB serverless to TiDB dedicated cluster?\",\"reasoning\":\"The main concern of the user.\"}]}"
"query": "I am designing a table based on TiDB's TTL feature, but when I try to create the table using a cluster created with Serverless, I get a `'TTL' is not supported on TiDB Serverless` error.\n\nI plan to use Dedicated on my production environment and Serverless on my development environment, so it would be helpful if the TTL feature could be used in a Serverless environment.\n\nI've read the documentation that says Serverless will support TTL features in the future, but is there a specific timeline for this?\n\nAlso, is it possible to prevent TTL syntax from causing errors in Serverless?",
"factors": "{\"relationships\":[{\"source_entity\":\"TTL Feature\",\"target_entity\":\"TiDB Serverless\",\"relationship_desc\":\"The TTL feature is not currently supported in TiDB Serverless.\",\"reasoning\":\"The problem the user is facing.\"},{\"source_entity\":\"TTL Feature\",\"target_entity\":\"Roadmap Support Timeline\",\"relationship_desc\":\"What's the roadmap timeline on when the TTL feature will be supported in TiDB Serverless.\",\"reasoning\":\"The main question the user is asking.\"},{\"source_entity\":\"TTL SQL Syntax\",\"target_entity\":\"Workaround for SQL Syntax Error\",\"relationship_desc\":\"Workaround to prevent TTL feature SQL syntax from causing errors in TiDB Serverless.\",\"reasoning\":\"The secondary question the user is asking.\"}]}"
},
{
"query": "tidb lighting to sync to serverless cluster,but the load command and the tidb-lighting tools dont have the tls config like --ssl-ca or --ca. so i can not sync to the full back data to the serverless",
"factors": "{\"relationships\":[{\"source_entity\":\"TiDB Lighting\",\"target_entity\":\"Serverless Cluster\",\"relationship_desc\":\"Sync data to a serverless cluster using TiDB Lighting.\",\"reasoning\":\"The user case what the user wants to achieve\"},{\"source_entity\":\"Load Command and TiDB Lighting Tools\",\"target_entity\":\"TLS Configuration\",\"relationship_desc\":\"How to configure TLS for TiDB Lightning?\",\"reasoning\":\"The basic question what the user itentionally asked.\"},{\"source_entity\":\"Lack of TLS Configuration\",\"target_entity\":\"Sync Issue\",\"relationship_desc\":\"The sync issue is caused by the lack of TLS configuration options for TiDB Lightning.\",\"reasoning\":\"The problem that the user is facing.\"}]}"
}
],
"signature_instructions": "You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:\n\n1. Extract Meaningful user intents and questions:\n - Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns\/questions\/problems\/use cases, etc.\n - Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.\n2. Establish Relationships:\n - Ensure that the source entity and target entities in each relationship are present in the list of extracted entities.\n - Carefully examine the text to identify all relationships between clearly-related entities, ensuring each relationship is correctly captured with accurate details about the interactions.\n - Clearly define the relationships, ensuring accurate directionality that reflects the logical or functional dependencies among entities. This means identifying which entity is the source, which is the target, and what the nature of their relationship is (e.g., $source_entity depends on $target_entity for $relationship).\n\n## Instructions:\n\n- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.\n- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.",
"signature_instructions": "You are a knowledge base graph expert and are very good at building knowledge graphs. Now you are assigned to extract the most critical concepts and their relationships from the query. Step-by-Step Analysis:\n\n1. Extract Meaningful user intents and questions:\n - Identify the question what the user itentionally asked, focusing on the the critial information about user's main concerns\/questions\/problems\/use cases, etc.\n - Make this question simple and clear and ensure that it is directly related to the user's main concerns. Simple and clear question can improve the search accuracy.\n2. Establish Relationships to describe the user's intents:\n - Define relationships that accurately represent the user's query intent and information needs.\n - Format each relationship as: (Source Entity) - [Relationship] -> (Target Entity), where the relationship describes what the user wants to know about the connection between these entities.\n\n## Instructions:\n\n- Limit to no more than 3 pairs. These pairs must accurately reflect the user's real (sub)questions.\n- Ensure that the extracted pairs are of high quality and do not introduce unnecessary search elements.\n- Ensure that the relationships and intents are grounded and factual, based on the information provided in the query.",
"signature_prefix": "Factors:"
}
}

0 comments on commit 5426e5f

Please sign in to comment.