Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/llm fact-checking in ranking based selector #555

Open
wants to merge 38 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
0f5e5e8
first commit
smilni Aug 10, 2023
dd2a93f
fix format of output
smilni Aug 10, 2023
02c711e
fixes
smilni Aug 10, 2023
16d6f20
fact-checking annotator, working version
smilni Aug 14, 2023
82f98a2
hyp considered incorrect after first contradiction
smilni Aug 14, 2023
c5f0274
better logging & not checking some skills
smilni Aug 14, 2023
98f0f20
modifying response selector to include fact-checking
smilni Aug 14, 2023
1fe10da
added default ENABLE_FACT_CHECKING=0 to Dockerfile
smilni Aug 15, 2023
0ca6999
added default FACTUAL_CONFORMITY_SERVICE_TIMEOUT=0 to Dockerfile
smilni Aug 15, 2023
769dba1
improved logging + minor fixes
smilni Aug 15, 2023
c852ac3
upd timeout
smilni Aug 15, 2023
58d426d
update README for ranking_based_response_selector by adding fact-chec…
smilni Aug 15, 2023
0b5b157
removed or modified obsolete files
smilni Aug 15, 2023
f4c3b6b
test file for fact checking
smilni Aug 15, 2023
eab1017
moved SKILLS_NOT_TO_FACT_CHECK, updated README for response selector
smilni Aug 15, 2023
11094d1
add README.md for annotator
smilni Aug 15, 2023
add0a27
code style & moved SKILLS_NOT_TO_FACT_CHECK
smilni Aug 15, 2023
0524a72
moving EXTERNAL_SKILLS from args to common
smilni Aug 15, 2023
722d187
moving EXTERNAL_SKILLS from args to common again
smilni Aug 15, 2023
0e7e441
improve logging and make SERVICE_PORT an arg
smilni Aug 15, 2023
fd5943a
component cards for annotator
smilni Aug 15, 2023
a9772b5
component cards for fact-checking annotator and ranking-based-respons…
smilni Aug 15, 2023
16bec65
codestyle upd
smilni Aug 15, 2023
4b02fe1
fix typo
smilni Aug 15, 2023
64e9119
revert accidental change in ranking_based_response_selector/test.py
smilni Aug 15, 2023
ecf897c
remove debugging in dp_formatters
smilni Aug 15, 2023
165989a
Merge branch 'dev' into feat/llm_in_ranking_based_selector
smilni Aug 15, 2023
4bab168
added to components.tsv
smilni Aug 15, 2023
04466ef
ranking-based-response-selector-fact-checking correct name everywhere
smilni Aug 24, 2023
be6113d
added fact-checking to runtests
smilni Aug 24, 2023
f1fcfe4
fix ranking based response selector path
smilni Aug 24, 2023
0d552ad
Merge branch 'dev' into feat/llm_in_ranking_based_selector
smilni Sep 5, 2023
968f979
ranking-based-response-selector-fact-checking in tests
smilni Sep 5, 2023
88e2660
typo fix
smilni Sep 26, 2023
76b20da
Merge branch 'dev' into feat/llm_in_ranking_based_selector
smilni Sep 26, 2023
695dcbe
minor fixes, mainly container name
smilni Sep 26, 2023
7a4f978
add SERVICE_PORT and SERVICE_NAME everywhere
smilni Sep 27, 2023
8ccd3e0
small change in test for more visible effect
smilni Sep 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions annotators/fact_checking/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM python:3.9.16
WORKDIR /src

COPY annotators/fact_checking/requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt

ARG SERVICE_PORT
ENV SERVICE_PORT ${SERVICE_PORT}
ARG GENERATIVE_SERVICE_URL
ENV GENERATIVE_SERVICE_URL ${GENERATIVE_SERVICE_URL}
ARG GENERATIVE_TIMEOUT=5
ENV GENERATIVE_TIMEOUT ${GENERATIVE_TIMEOUT}
ARG GENERATIVE_SERVICE_CONFIG
ENV GENERATIVE_SERVICE_CONFIG ${GENERATIVE_SERVICE_CONFIG}
ARG ENVVARS_TO_SEND
ENV ENVVARS_TO_SEND ${ENVVARS_TO_SEND}

COPY annotators/fact_checking /src
COPY common /src/common

CMD gunicorn --workers=1 server:app -b 0.0.0.0:8182 --timeout=1200
16 changes: 16 additions & 0 deletions annotators/fact_checking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Fact Checking

## Description

Fact Checking conducts basic fact-checking of response candidates. As of now, it considers all hypotheses derived from external sources correct. Internally generated hypotheses are fact-checked by ensuring that they do not contradict any of the external hypotheses. For example, if `dff_google_api_skill` that relies on Google as a source of external knowledge responds _"Person X is 25 years old"_ and some solely LLM-based skill provides a hallucinated responds _"Person X is 23 years old"_, the second hypotheses is considered incorrect as it contradicts the first, external one.

NB: Scripted responses from `dummy_skill` and `dff_intent_responder_skill` are not fact-checked for the sake of efficiency and are always deemed correct.

## Parameters

```
ENVVARS_TO_SEND: API keys splitted by comma to get as env variables
GENERATIVE_SERVICE_URL: LLM to utilize for fact-checking
GENERATIVE_TIMEOUT: timeout for the request to LLM
GENERATIVE_SERVICE_CONFIG: configuration file with generative parameters to utilize
```
9 changes: 9 additions & 0 deletions annotators/fact_checking/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
flask==1.1.1
itsdangerous==2.0.1
gunicorn==19.9.0
requests==2.22.0
sentry-sdk[flask]==0.14.1
healthcheck==1.3.3
jinja2<=3.0.3
Werkzeug<=2.0.3
deeppavlov==1.1.1
103 changes: 103 additions & 0 deletions annotators/fact_checking/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import logging
from os import getenv
import sentry_sdk
import json
from flask import Flask, jsonify, request
from sentry_sdk.integrations.flask import FlaskIntegration
from common.prompts import send_request_to_prompted_generative_service, compose_sending_variables
from common.response_selection import EXTERNAL_SKILLS

# logging here because it conflicts with tf

logging.basicConfig(format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO)
logger = logging.getLogger(__name__)
sentry_sdk.init(dsn=getenv("SENTRY_DSN"), integrations=[FlaskIntegration()])
app = Flask(__name__)

ENVVARS_TO_SEND = getenv("ENVVARS_TO_SEND", None)
ENVVARS_TO_SEND = [] if ENVVARS_TO_SEND is None else ENVVARS_TO_SEND.split(",")
GENERATIVE_SERVICE_URL = getenv("GENERATIVE_SERVICE_URL")
GENERATIVE_TIMEOUT = int(getenv("GENERATIVE_TIMEOUT"))
GENERATIVE_SERVICE_CONFIG = getenv("GENERATIVE_SERVICE_CONFIG")
if GENERATIVE_SERVICE_CONFIG:
with open(f"common/generative_configs/{GENERATIVE_SERVICE_CONFIG}", "r") as f:
GENERATIVE_SERVICE_CONFIG = json.load(f)
SKILLS_NOT_TO_FACT_CHECK = ["dummy_skill", "dff_intent_responder_skill"]


def check_hyp_with_llm(curr_prompt, human_uttr_attr):
lm_service_kwargs = human_uttr_attr.pop("lm_service_kwargs", None)
lm_service_kwargs = {} if lm_service_kwargs is None else lm_service_kwargs
envvars_to_send = ENVVARS_TO_SEND if len(ENVVARS_TO_SEND) else human_uttr_attr.get("envvars_to_send", [])
sending_variables = compose_sending_variables(
lm_service_kwargs,
envvars_to_send,
**human_uttr_attr,
)
response = send_request_to_prompted_generative_service(
"",
curr_prompt,
GENERATIVE_SERVICE_URL,
GENERATIVE_SERVICE_CONFIG,
GENERATIVE_TIMEOUT,
sending_variables,
)
result = response[0]
if "yes" in result.lower():
_is_hyp_correct = False
else:
_is_hyp_correct = True
return _is_hyp_correct


@app.route("/respond_batch", methods=["POST"])
def respond_batch():
hypotheses = request.json["hypotheses"]
human_uttr_attributes = request.json["human_uttr_attributes"]
ie_types = ["external" if hyp["skill_name"] in EXTERNAL_SKILLS else "internal" for hyp in hypotheses]
external_service_hyps = [
(hyp["text"], hyp["skill_name"]) for hyp in hypotheses if hyp["skill_name"] in EXTERNAL_SKILLS
]
results = []
for hyp, human_uttr_attr, ie_type in zip(hypotheses, human_uttr_attributes, ie_types):
hyp_text = hyp["text"]
try:
if ie_type == "external":
logger.info(f"Hypothesis `{hyp_text}` is considered correct as it is external.")
results += ["Correct"]
elif hyp["skill_name"] in SKILLS_NOT_TO_FACT_CHECK:
logger.info(f"Hypothesis `{hyp_text}` is not checked as it was produced by {hyp['skill_name']}.")
results += ["Correct"]
else:
if len(external_service_hyps) == 0:
logger.info(
f"Internal hypothesis `{hyp_text}` is considered correct as there are no external hypotheses \
to check it upon."
)
results += ["Correct"]
else:
_is_hyp_correct = True
for external_service_hyp, external_service_name in external_service_hyps:
curr_prompt = f"""Hypothesis: "{hyp_text}"
Does Hypothesis contradict Fact that {external_service_hyp}? Always answer only Yes or No without explanation."""
logger.info(f"Checking internal hypothesis `{hyp_text}` with LLM. Prompt: {curr_prompt}")
_is_hyp_correct_one_step = check_hyp_with_llm(curr_prompt, human_uttr_attr)
if not _is_hyp_correct_one_step:
_is_hyp_correct = False
logger.info(
f"""Internal hypothesis `{hyp_text}` is incorrect according to external service \
{external_service_name}."""
)
results += ["Incorrect"]
break
if _is_hyp_correct:
logger.info(f"Internal hypothesis `{hyp_text}` is correct according to all external services.")
results += ["Correct"]
except Exception as e:
logger.error(e)
results += ["Correct"]
return jsonify([{"batch": results}])


if __name__ == "__main__":
app.run(debug=False, host="0.0.0.0", port=3000)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
SERVICE_PORT: 8182
SERVICE_NAME: fact_checking
GENERATIVE_SERVICE_URL: http://openai-api-chatgpt:8145/respond
GENERATIVE_TIMEOUT: 5
GENERATIVE_SERVICE_CONFIG: openai-chatgpt.json
ENVVARS_TO_SEND: OPENAI_API_KEY,OPENAI_ORGANIZATION
FLASK_APP: server
31 changes: 31 additions & 0 deletions annotators/fact_checking/service_configs/fact-checking/service.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: fact-checking
endpoints:
- respond_batch
compose:
env_file:
- .env, .env_secret
build:
args:
SERVICE_PORT: 8182
SERVICE_NAME: fact_checking
GENERATIVE_SERVICE_URL: http://openai-api-chatgpt:8145/respond
GENERATIVE_TIMEOUT: 5
GENERATIVE_SERVICE_CONFIG: openai-chatgpt.json
ENVVARS_TO_SEND: OPENAI_API_KEY,OPENAI_ORGANIZATION
context: .
dockerfile: annotators/fact_checking/Dockerfile
command: flask run -h 0.0.0.0 -p 8182
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 128M
reservations:
memory: 128M
volumes:
- "./annotators/fact_checking:/src"
- "./common:/src/common"
ports:
- 8182:8182
proxy: null
33 changes: 33 additions & 0 deletions annotators/fact_checking/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import requests
from os import getenv

SERVICE_PORT = getenv("SERVICE_PORT")


def main():
url = f"http://0.0.0.0:{SERVICE_PORT}/respond_batch"
result = requests.post(
url=url,
json={
"hypotheses": [
{"skill_name": "dff_google_api_skill", "text": "Jack is 5 years old."},
{
"skill_name": "dff_dream_persona_chatgpt_prompted_skill",
"text": "Jack is 999 years old.",
},
{
"skill_name": "dummy_skill",
"text": "Sorry, I cannot answer your question.",
},
],
"human_uttr_attributes": [{}, {}, {}],
},
)
result = result.json()[0]["batch"]
result_gold = ["Correct", "Incorrect", "Correct"]
assert result == result_gold, f"Got\n{result}\n, something is wrong"
print("Success!")


if __name__ == "__main__":
main()
3 changes: 3 additions & 0 deletions annotators/fact_checking/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

python test.py
9 changes: 7 additions & 2 deletions assistant_dists/dream/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@ services:
- ".:/dp-agent"
ports:
- 4242:4242

sentseg:
volumes:
- "./annotators/SentSeg:/src"
ports:
- 8011:8011
ranking-based-response-selector:
ranking-based-response-selector-fact-checking:
volumes:
- "./response_selectors/ranking_based_response_selector:/src"
- "./common:/src/common"
Expand Down Expand Up @@ -157,4 +156,10 @@ services:
- "./common:/src/common"
ports:
- 8167:8167
fact-checking:
volumes:
- "./annotators/fact_checking:/src"
- "./common:/src/common"
ports:
- 8182:8182
version: "3.7"
33 changes: 30 additions & 3 deletions assistant_dists/dream/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,22 @@ services:
agent:
command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.pipeline_config=assistant_dists/dream/pipeline_conf.json'
environment:
WAIT_HOSTS: "sentseg:8011, ranking-based-response-selector:8002,
WAIT_HOSTS: "sentseg:8011, ranking-based-response-selector-fact-checking:8002,
dff-intent-responder-skill:8012, intent-catcher:8014, ner:8021,
factoid-qa:8071, kbqa:8072, entity-linking:8075, wiki-parser:8077, text-qa:8078,
combined-classification:8087, fact-retrieval:8100, entity-detection:8103,
sentence-ranker:8128, property-extraction:8136, prompt-selector:8135, openai-api-chatgpt:8145,
dff-dream-persona-chatgpt-prompted-skill:8137, dff-dream-faq-prompted-skill:8170,
openai-api-chatgpt-16k:8167, summarization-annotator:8058, dialog-summarizer:8059"
openai-api-chatgpt-16k:8167, summarization-annotator:8058, dialog-summarizer:8059,
fact-checking:8182"
smilni marked this conversation as resolved.
Show resolved Hide resolved
WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-1000}
HIGH_PRIORITY_INTENTS: 1
RESTRICTION_FOR_SENSITIVE_CASE: 1
ALWAYS_TURN_ON_ALL_SKILLS: 0
LANGUAGE: EN
FALLBACK_FILE: fallbacks_dream_en.json

ranking-based-response-selector:
ranking-based-response-selector-fact-checking:
env_file: [ .env ]
build:
args:
Expand All @@ -26,6 +27,10 @@ services:
SENTENCE_RANKER_ANNOTATION_NAME: sentence_ranker
SENTENCE_RANKER_SERVICE_URL: http://sentence-ranker:8128/respond
SENTENCE_RANKER_TIMEOUT: 3
ENABLE_FACT_CHECKING: 1
FACTUAL_CONFORMITY_SERVICE_URL: http://fact-checking:8182/respond
FACTUAL_CONFORMITY_SERVICE_TIMEOUT: 5
FACTUAL_CONFORMITY_ANNOTATION_NAME: fact_checking
N_UTTERANCES_CONTEXT: 5
FILTER_TOXIC_OR_BADLISTED: 1
FALLBACK_FILE: fallbacks_dream_en.json
Expand Down Expand Up @@ -484,4 +489,26 @@ services:
reservations:
memory: 4G

fact-checking:
env_file: [ .env, .env_secret ]
build:
args:
SERVICE_PORT: 8182
SERVICE_NAME: fact_checking
GENERATIVE_SERVICE_URL: http://openai-api-chatgpt:8145/respond
GENERATIVE_TIMEOUT: 5
GENERATIVE_SERVICE_CONFIG: openai-chatgpt.json
ENVVARS_TO_SEND: OPENAI_API_KEY,OPENAI_ORGANIZATION
context: .
dockerfile: annotators/fact_checking/Dockerfile
command: flask run -h 0.0.0.0 -p 8182
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 128M
reservations:
memory: 128M

version: '3.7'
2 changes: 1 addition & 1 deletion assistant_dists/dream/gpu1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ services:
# - ./venv/data/db_data:/root/data/db
sentseg:
restart: unless-stopped
ranking-based-response-selector:
ranking-based-response-selector-fact-checking:
restart: unless-stopped
dff-intent-responder-skill:
restart: unless-stopped
Expand Down
26 changes: 22 additions & 4 deletions assistant_dists/dream/pipeline_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,24 @@
"component": "components/XGwmAHtAOu0NDqqG3QCJw.yml",
"service": "services/sentence_ranker/service_configs/sentence-ranker"
}
},
"fact_checking": {
"connector": {
"protocol": "http",
"timeout": 5.0,
"url": "http://fact-checking:8182/respond_batch"
},
"dialog_formatter": "state_formatters.dp_formatters:hypotheses_and_attributes",
"response_formatter": "state_formatters.dp_formatters:simple_formatter_service",
"previous_services": [
"skills"
],
"state_manager_method": "add_hypothesis_annotation_batch",
"is_enabled": true,
"source": {
"component": "components/oijrtOIfj94jnvkf30n.yml",
"service": "annotators/fact_checking/service_configs/fact-checking"
}
}
},
"skill_selectors": {
Expand Down Expand Up @@ -540,8 +558,8 @@
"response_selector": {
"connector": {
"protocol": "http",
"timeout": 1.0,
"url": "http://ranking-based-response-selector:8002/respond"
"timeout": 5.0,
"url": "http://ranking-based-response-selector-fact-checking:8002/respond"
},
"dialog_formatter": "state_formatters.dp_formatters:cropped_dialog",
"response_formatter": "state_formatters.dp_formatters:base_response_selector_formatter_service",
Expand All @@ -551,8 +569,8 @@
"state_manager_method": "add_bot_utterance",
"is_enabled": true,
"source": {
"component": "components/YJzc7NwGrLmKp6gfZJh7X1.yml",
"service": "response_selectors/ranking_based_response_selector/service_configs/ranking-based-response-selector"
"component": "components/dvpefjoPOHC90efoi.yml",
"service": "response_selectors/ranking_based_response_selector/service_configs/ranking-based-response-selector-fact-checking"
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions common/response_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,5 @@
"Oh, before I forget,",
"I wanted to mention that,",
]

EXTERNAL_SKILLS = ["factoid_qa", "dff_google_api_skill"]
Loading
Loading