Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ feat(GraphRAG): enhance GraphRAG by graph community summary #1801

Merged
merged 108 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
93f2a74
✨ feat(GraphRAG): MSR GraphRAG Implementation Proposal
M1n9X Aug 9, 2024
e22b7ba
add config item and format code
fanzhidongyzby Aug 12, 2024
84092c4
1. add interface: insert_graph and stream_query
KingSkyLi Aug 13, 2024
e952131
extract data structure from graph and create sql orm
M1n9X Aug 13, 2024
7e9ed94
update SQLiteORM with embedded SQLiteConnector
M1n9X Aug 14, 2024
d28aac3
implement global search method
M1n9X Aug 15, 2024
830b8e9
fixed stream qeury and add stream run;
KingSkyLi Aug 15, 2024
0db297d
fixed test tugraph store;
KingSkyLi Aug 15, 2024
ab9fdf1
optimize apis and CommunityStore
fanzhidongyzby Aug 15, 2024
130672f
add CommunityMetastore
fanzhidongyzby Aug 16, 2024
d4ee4da
add upload_plugin funtion;
KingSkyLi Aug 16, 2024
bcdfc68
implement CommunityMetastore
fanzhidongyzby Aug 17, 2024
af819ba
add history to extractor and drop support
fanzhidongyzby Aug 18, 2024
1ae198e
add load plugin and run plugin;
KingSkyLi Aug 18, 2024
ca718de
fixed plugin_names upload;
KingSkyLi Aug 18, 2024
22f9cb2
pull up metastore api
fanzhidongyzby Aug 19, 2024
329703b
code review conn_tugraph;
KingSkyLi Aug 19, 2024
7ba2b9a
fixed tugraph_store query and test;
KingSkyLi Aug 19, 2024
62e817d
optimize indexing code and reimplement query logic
M1n9X Aug 19, 2024
86111a7
fixed leiden;
KingSkyLi Aug 19, 2024
3168cc1
fixed;
KingSkyLi Aug 19, 2024
f8cb9bb
update ignore;
KingSkyLi Aug 19, 2024
bd09fe9
update ignore;
KingSkyLi Aug 19, 2024
f6abe2b
fixed;
KingSkyLi Aug 19, 2024
5f4e5ab
fix circular dependency
fanzhidongyzby Aug 19, 2024
bbcf65c
parse response in GraphExtractor
M1n9X Aug 20, 2024
57bb5a2
add leiden.so
KingSkyLi Aug 20, 2024
31a3762
change discover_communities function;
KingSkyLi Aug 20, 2024
cfbb097
fix bugs in naive graphrag
fanzhidongyzby Aug 20, 2024
93442a7
add vertex and edge field "name";
KingSkyLi Aug 20, 2024
be10129
fixed schema;
KingSkyLi Aug 20, 2024
e177b03
fix:community extract bug
Aries-ckt Aug 21, 2024
502c6f9
fixed GraphRag create Knowlage Graph;
KingSkyLi Aug 21, 2024
94c6069
fix runtime bugs
fanzhidongyzby Aug 21, 2024
1b04773
Remove leiden.so
KingSkyLi Aug 22, 2024
d986992
change leiden path;
KingSkyLi Aug 22, 2024
251b129
add graph_name re;
KingSkyLi Aug 22, 2024
0d1f245
fixed bug and todo;
KingSkyLi Aug 22, 2024
ecadfcf
add "dbgpt-tugraph-plugins>=0.1.0rc1" default install;
KingSkyLi Aug 22, 2024
5ea1944
fix configs
fanzhidongyzby Aug 23, 2024
5502e2d
fixed;
KingSkyLi Aug 23, 2024
8466f3d
optimize graph extract prompt and parse method
M1n9X Aug 23, 2024
5ca457f
chore: Add new 'graph_rag' requires
fangyinc Aug 23, 2024
efd6709
add url
fanzhidongyzby Aug 23, 2024
041b87e
fix prompt
fanzhidongyzby Aug 23, 2024
ae20d7a
add OGM layer for graphstore
fanzhidongyzby Aug 23, 2024
bfb9b0b
fixed bug;
KingSkyLi Aug 23, 2024
63cf2e1
fixed bug.
KingSkyLi Aug 23, 2024
a0db3f1
fix runtime bugs
fanzhidongyzby Aug 23, 2024
c09ca68
optimize prompt
fanzhidongyzby Aug 23, 2024
c7c54af
fix ignore
fanzhidongyzby Aug 23, 2024
e907ff5
feat:add truncate and fmt code
Aries-ckt Aug 24, 2024
40fcbd6
optimize graphextractor prompt
fanzhidongyzby Aug 24, 2024
db16082
fix bugs
fanzhidongyzby Aug 24, 2024
01d1448
fix style
fanzhidongyzby Aug 25, 2024
2426bd0
add truncate
fanzhidongyzby Aug 25, 2024
63c363b
add tugraph_store truncate;
KingSkyLi Aug 25, 2024
13a0dde
rename api and optimize code
fanzhidongyzby Aug 25, 2024
93fb90d
change tugraph_store _community_id default value and delete _weight f…
KingSkyLi Aug 25, 2024
c4f1ea4
refractor graph api
fanzhidongyzby Aug 25, 2024
0103bc5
fixed bug;
KingSkyLi Aug 25, 2024
1e56182
opt community summary prompt
fanzhidongyzby Aug 26, 2024
6f99eda
adjust packages
fanzhidongyzby Aug 26, 2024
0683bcc
修复向量召回问题
fanzhidongyzby Aug 26, 2024
f5483e9
fixed tugraph_store bug and code review query function;
KingSkyLi Aug 26, 2024
1388acd
fixed bug;
KingSkyLi Aug 26, 2024
bfcdf3c
fix bugs
fanzhidongyzby Aug 27, 2024
dd934d3
fixed white_list;
KingSkyLi Aug 27, 2024
e084982
refactor: use @antv/g6 instead of cytoscape
yvonneyx Aug 28, 2024
b6559cb
fixed graphvis api;
KingSkyLi Aug 28, 2024
a9319ff
docker/examples/community_summary/leiden.so
KingSkyLi Aug 28, 2024
47a237e
refactor: web config
yvonneyx Aug 28, 2024
acffef5
feat(web): use communityId generated by backend
yvonneyx Aug 28, 2024
bc23eb2
refactor(web): adjust file structure
yvonneyx Aug 28, 2024
14cbb4a
optimize prompt
fanzhidongyzby Aug 28, 2024
8ee0c0f
fix prompt and QA context
fanzhidongyzby Aug 28, 2024
0223ac9
fix: remove unnecessary file
yvonneyx Aug 29, 2024
abcf3bf
fix(web): modify incorrect route
yvonneyx Aug 29, 2024
7408465
opt prompt on special chars
fanzhidongyzby Aug 29, 2024
7e6d053
refactor(web): enhance graph layout
yvonneyx Aug 29, 2024
77478e4
add test file;
KingSkyLi Aug 29, 2024
f3cab22
style:fmt and fix mypy error.
Aries-ckt Aug 29, 2024
8804c87
Merge branch 'msr_graphrag' of https://github.com/M1n9X/DB-GPT into m…
Aries-ckt Aug 29, 2024
ed6ae26
adjust md files
fanzhidongyzby Aug 29, 2024
b252cc2
style:fmt
Aries-ckt Aug 29, 2024
610ed6a
Merge branch 'msr_graphrag' of https://github.com/M1n9X/DB-GPT into m…
Aries-ckt Aug 29, 2024
1c4557f
change osgraph.md and add tugraph-analytics;
KingSkyLi Aug 29, 2024
6d51806
Merge remote-tracking branch 'origin/main' into msr_v2
Aries-ckt Aug 29, 2024
004003d
fix:solve the main conflict
Aries-ckt Aug 29, 2024
9033f50
remove blank
fanzhidongyzby Aug 29, 2024
0888f17
Delete useless test code;
KingSkyLi Aug 29, 2024
803eb0c
refactor(web): adjust data
yvonneyx Aug 29, 2024
1599b4a
update graphvis api;
KingSkyLi Aug 29, 2024
8f9da4d
update graphvis api;
KingSkyLi Aug 29, 2024
c7dc954
graphvis api delete isolated nodes
KingSkyLi Aug 29, 2024
6bb3ffd
communityid->communityId
KingSkyLi Aug 29, 2024
7764366
chore: Add __init__ for community
fangyinc Aug 30, 2024
07dde3d
fix edge bug
fanzhidongyzby Aug 30, 2024
562f874
style:fmt
Aries-ckt Aug 30, 2024
81bbdc0
fix(web): update variable name
yvonneyx Aug 30, 2024
d583d41
fix code check
fanzhidongyzby Aug 30, 2024
c2065b4
add dbgpt.md and graph_rag_mini.md files;
KingSkyLi Aug 30, 2024
6c7cc58
add graph_rag_summary_example.py;
KingSkyLi Aug 30, 2024
84e0d4e
重构GraphRAG测试
fanzhidongyzby Aug 30, 2024
f17c32b
fix:recall_test sync->async
Aries-ckt Aug 30, 2024
1b569c9
add ut
fanzhidongyzby Aug 30, 2024
64dd972
fix:conflict
Aries-ckt Aug 30, 2024
58f5f13
fixed code style;
KingSkyLi Aug 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ EMBEDDING_MODEL=text2vec
#EMBEDDING_MODEL=bge-large-zh
KNOWLEDGE_CHUNK_SIZE=500
KNOWLEDGE_SEARCH_TOP_SIZE=5
KNOWLEDGE_GRAPH_SEARCH_TOP_SIZE=50
KNOWLEDGE_GRAPH_SEARCH_TOP_SIZE=200
## Maximum number of chunks to load at once, if your single document is too large,
## you can set this value to a higher value for better performance.
## if out of memory when load large document, you can set this value to a lower value.
Expand Down Expand Up @@ -157,6 +157,11 @@ EXECUTE_LOCAL_COMMANDS=False
#*******************************************************************#
VECTOR_STORE_TYPE=Chroma
GRAPH_STORE_TYPE=TuGraph
GRAPH_COMMUNITY_SUMMARY_ENABLED=True
KNOWLEDGE_GRAPH_EXTRACT_SEARCH_TOP_SIZE=5
KNOWLEDGE_GRAPH_EXTRACT_SEARCH_RECALL_SCORE=0.3
KNOWLEDGE_GRAPH_COMMUNITY_SEARCH_TOP_SIZE=20
KNOWLEDGE_GRAPH_COMMUNITY_SEARCH_RECALL_SCORE=0.0

### Chroma vector db config
#CHROMA_PERSIST_PATH=/root/DB-GPT/pilot/data
Expand Down Expand Up @@ -187,7 +192,7 @@ ElasticSearch_PASSWORD={your_password}
#TUGRAPH_PASSWORD=73@TuGraph
#TUGRAPH_VERTEX_TYPE=entity
#TUGRAPH_EDGE_TYPE=relation
#TUGRAPH_EDGE_NAME_KEY=label
#TUGRAPH_PLUGIN_NAMES=leiden

#*******************************************************************#
#** WebServer Language Support **#
Expand Down
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ __pycache__/
*$py.class

# C extensions
*.so
message/
dbgpt/util/extensions/
.env*
Expand Down Expand Up @@ -185,4 +184,4 @@ thirdparty
/examples/**/*.gv
/examples/**/*.gv.pdf
/i18n/locales/**/**/*_ai_translated.po
/i18n/locales/**/**/*~
/i18n/locales/**/**/*~
3 changes: 3 additions & 0 deletions dbgpt/_private/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,9 @@ def __init__(self) -> None:

# Vector Store Configuration
self.VECTOR_STORE_TYPE = os.getenv("VECTOR_STORE_TYPE", "Chroma")
self.GRAPH_COMMUNITY_SUMMARY_ENABLED = (
os.getenv("GRAPH_COMMUNITY_SUMMARY_ENABLED", "").lower() == "true"
)
self.MILVUS_URL = os.getenv("MILVUS_URL", "127.0.0.1")
self.MILVUS_PORT = os.getenv("MILVUS_PORT", "19530")
self.MILVUS_USERNAME = os.getenv("MILVUS_USERNAME", None)
Expand Down
6 changes: 4 additions & 2 deletions dbgpt/app/knowledge/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,13 +112,15 @@ def arguments(space_id: str):


@router.post("/knowledge/{space_name}/recall_test")
def recall_test(
async def recall_test(
space_name: str,
request: DocumentRecallTestRequest,
):
print(f"/knowledge/{space_name}/recall_test params:")
try:
return Result.succ(knowledge_space_service.recall_test(space_name, request))
return Result.succ(
await knowledge_space_service.recall_test(space_name, request)
)
except Exception as e:
return Result.failed(code="E000X", msg=f"{space_name} recall_test error {e}")

Expand Down
19 changes: 11 additions & 8 deletions dbgpt/app/knowledge/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ def get_knowledge_space_by_ids(self, ids):
"""
return knowledge_space_dao.get_knowledge_space_by_ids(ids)

def recall_test(
async def recall_test(
self, space_name, doc_recall_test_request: DocumentRecallTestRequest
):
logger.info(f"recall_test {space_name}, {doc_recall_test_request}")
Expand Down Expand Up @@ -338,7 +338,7 @@ def recall_test(
knowledge_space_retriever = KnowledgeSpaceRetriever(
space_id=space.id, top_k=top_k
)
chunks = knowledge_space_retriever.retrieve_with_scores(
chunks = await knowledge_space_retriever.aretrieve_with_scores(
question, score_threshold
)
retrievers_end_time = timeit.default_timer()
Expand Down Expand Up @@ -646,13 +646,16 @@ def query_graph(self, space_name, limit):
graph = vector_store_connector.client.query_graph(limit=limit)
res = {"nodes": [], "edges": []}
for node in graph.vertices():
res["nodes"].append({"vid": node.vid})
for edge in graph.edges():
res["edges"].append(
res["nodes"].append(
{
"src": edge.sid,
"dst": edge.tid,
"label": edge.props[graph.edge_label],
"id": node.vid,
"communityId": node.get_prop("_community_id"),
"name": node.vid,
"type": "",
}
)
for edge in graph.edges():
res["edges"].append(
{"source": edge.sid, "target": edge.tid, "name": edge.name, "type": ""}
)
return res
28 changes: 21 additions & 7 deletions dbgpt/datasource/conn_tugraph.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""TuGraph Connector."""

import json
from typing import Dict, List, cast
from typing import Dict, Generator, List, cast

from .base import BaseConnector

Expand All @@ -23,11 +23,16 @@ def __init__(self, driver, graph):
def create_graph(self, graph_name: str) -> None:
"""Create a new graph."""
# run the query to get vertex labels
with self._driver.session(database="default") as session:
graph_list = session.run("CALL dbms.graph.listGraphs()").data()
exists = any(item["graph_name"] == graph_name for item in graph_list)
if not exists:
session.run(f"CALL dbms.graph.createGraph('{graph_name}', '', 2048)")
try:
with self._driver.session(database="default") as session:
graph_list = session.run("CALL dbms.graph.listGraphs()").data()
exists = any(item["graph_name"] == graph_name for item in graph_list)
if not exists:
session.run(
f"CALL dbms.graph.createGraph('{graph_name}', '', 2048)"
)
except Exception as e:
raise Exception(f"Failed to create graph '{graph_name}': {str(e)}")

def delete_graph(self, graph_name: str) -> None:
"""Delete a graph."""
Expand Down Expand Up @@ -89,10 +94,19 @@ def close(self):
self._driver.close()

def run(self, query: str, fetch: str = "all") -> List:
"""Run query."""
with self._driver.session(database=self._graph) as session:
try:
result = session.run(query)
return list(result)
except Exception as e:
raise Exception(f"Query execution failed: {e}")

def run_stream(self, query: str) -> Generator:
"""Run GQL."""
with self._driver.session(database=self._graph) as session:
result = session.run(query)
return list(result)
yield from result

def get_columns(self, table_name: str, table_type: str = "vertex") -> List[Dict]:
"""Get fields about specified graph.
Expand Down
12 changes: 6 additions & 6 deletions dbgpt/rag/embedding/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,19 @@
from .rerank import CrossEncoderRerankEmbeddings, OpenAPIRerankEmbeddings # noqa: F401

__ALL__ = [
"CrossEncoderRerankEmbeddings",
"DefaultEmbeddingFactory",
"EmbeddingFactory",
"Embeddings",
"HuggingFaceBgeEmbeddings",
"HuggingFaceEmbeddings",
"HuggingFaceInferenceAPIEmbeddings",
"HuggingFaceInstructEmbeddings",
"JinaEmbeddings",
"OpenAPIEmbeddings",
"OllamaEmbeddings",
"DefaultEmbeddingFactory",
"EmbeddingFactory",
"WrappedEmbeddingFactory",
"TongYiEmbeddings",
"CrossEncoderRerankEmbeddings",
"OpenAPIEmbeddings",
"OpenAPIRerankEmbeddings",
"QianFanEmbeddings",
"TongYiEmbeddings",
"WrappedEmbeddingFactory",
]
10 changes: 9 additions & 1 deletion dbgpt/rag/index/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ def __init__(self, executor: Optional[Executor] = None):
"""Init index store."""
self._executor = executor or ThreadPoolExecutor()

@abstractmethod
def get_config(self) -> IndexStoreConfig:
"""Get the index store config."""

@abstractmethod
def load_document(self, chunks: List[Chunk]) -> List[str]:
"""Load document in index database.
Expand Down Expand Up @@ -104,6 +108,10 @@ def delete_by_ids(self, ids: str) -> List[str]:
ids(str): The vector ids to delete, separated by comma.
"""

@abstractmethod
def truncate(self) -> List[str]:
"""Truncate data by name."""

@abstractmethod
def delete_vector_name(self, index_name: str):
"""Delete index by name.
Expand Down Expand Up @@ -188,7 +196,7 @@ def similar_search(
Return:
List[Chunk]: The similar documents.
"""
return self.similar_search_with_scores(text, topk, 1.0, filters)
return self.similar_search_with_scores(text, topk, 0.0, filters)

async def asimilar_search_with_scores(
self,
Expand Down
16 changes: 16 additions & 0 deletions dbgpt/rag/transformer/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,27 @@
class TransformerBase:
"""Transformer base class."""

@abstractmethod
def truncate(self):
"""Truncate operation."""

@abstractmethod
def drop(self):
"""Clean operation."""


class EmbedderBase(TransformerBase, ABC):
"""Embedder base class."""


class SummarizerBase(TransformerBase, ABC):
"""Summarizer base class."""

@abstractmethod
async def summarize(self, **args) -> str:
"""Summarize result."""


class ExtractorBase(TransformerBase, ABC):
"""Extractor base class."""

Expand Down
Loading
Loading