-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tracing and rework cli so that the controller command does not start a FastAPI app #669
base: main
Are you sure you want to change the base?
Conversation
keithralphs
commented
Oct 14, 2024
- Add tracing intialisation and decorators
- In cli.py move some imports within functions to prevent unnecessary FastAPI apps being created
- Add capability to specify a config file in debug launchers
- Add Environment vars to initialise traceability options and set Jager export to off by default
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #669 +/- ##
==========================================
- Coverage 92.62% 92.53% -0.09%
==========================================
Files 35 35
Lines 1654 1795 +141
==========================================
+ Hits 1532 1661 +129
- Misses 122 134 +12 ☔ View full report in Codecov by Sentry. |
"serve" | ||
] | ||
"env": { | ||
"OTLP_EXPORT_ENABLED": "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does exporting default to true
? Would I also have to export OTLP_EXPORT_ENABLED=false
if I wanted to run blueapi from the command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, (assuming I understand the question), the default is false in the helm chart, dockerfile and here so you can just run BlueAPI and it wilkl not try to connect to Jaeger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the env var is not set (like if running blueapi outside of a dev container), then it's treated like True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, the code that interprest the Env Var (in core/init.py) is
OTLP_EXPORT_ENABLED = environ.get("OTLP_EXPORT_ENABLED") == "true"
and so will only enable export to Jaeger if the Env Var both exists and has been set to the string 'true'. This should however mean that there is no need to set this Env Var to 'false' as anything but 'true', including non-existance will result in the export being off, which is the desired default state. So I will remove the lines that explicitly do this.
"env": { | ||
"OTLP_EXPORT_ENABLED": "false" | ||
}, | ||
"args": "-c ${input:config_path} controller ${input:args}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should: Reconcile with #668 by @Relm-Arrowny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @callumforrester meant #663
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add Ray's arg name change into mine so that they are consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@callumforrester do you think we should have an extra launcher for when the config file is specified or just make it the default that one is requested - you would have to press return to specify non but I guess that's fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite happy with the extra launcher, I don't need to specify a config file 80% of the time, so having to press enter for no reason 80% of the time is actually mildly annoying. I realise it's only mildly but the "just one little extra step isn't too bad" mentality is a slippery slope that eventually leads to GDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated so there is a custom config option or both the serve and controller commands
ENV OTLP_EXPORT_ENABLED=false | ||
# enable opentelemetry support | ||
ENV OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf | ||
# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger | ||
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318 | ||
# Ensure that all Http headers are captured | ||
ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*" | ||
ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should: As discussed, we'd prefer not to set config in the Dockerfile and instead use the helm chart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work for the dev container though as it uses the dockerfile to initialise? The config is set in the helm chart too but that doesn't help in VSCode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the devcontainer you should be editing .devcontainer/devcontainer.json
, which configures the base image into a devcontainer, I think you need to add something that looks a bit like this:
"containerEnv": {
"OTLP_EXPORT_ENABLED": false,
etc
etc
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, not setting the variable at all will result in Export being off, which is the desired default state, however the other vars are needed to configure tracing correctly, so that if export is enabled it can function. I guess therefore these should be moved into the devcontainer.json as you say, but where can I set them for a plain valnilla blueapi being run from the command line with no Dev Container or hekm chart involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at that point they should be passed in at container startup time
docker run -d --rm -e OLTP_FOO=true -e OLTP_BAR=false blueapi:latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should, in general, update the docs to reflect this PR. I don't mind if you do that here or make a separate issue
@@ -32,7 +32,7 @@ def _mock_with_name(name: str) -> MagicMock: | |||
return mock | |||
|
|||
|
|||
def wrong_return_type() -> int: | |||
def wrong_return_type(*args, **kwargs) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this is so tracing can inject a parameter for the function to ignore? I'm a little concerned that you have to do this, add *args, **kwargs
for non-obvious reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I can't fully remember why this was needed as it was some weeks ago. I do remember though that the tests would not succeed (when they should do) without this change. It was probably to do with the examination of the funciton signature by the start_as_current_span decorator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding unused, unnecessary parameters to satisfy some library somewhere is not good practice (or in more succinct terms: #669 (comment)), we should try to work out what is complaining and see if we can satisfy it another way. Try removing it and seeing why the tests fail, I'm happy to help out if it's a weird error message.
): | ||
task_started.set() | ||
|
||
LOGGER.info(f"Submitting: {trackable_task}") | ||
try: | ||
sub = self.worker_events.subscribe(mark_task_as_started) | ||
self._context_register[trackable_task.task_id] = get_trace_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must: Context register does not ever appear to be cleared, so if blueapi is left running for weeks or months (quite likely) it will get quite big and cumbersome. This class is already complicated so I'm not immediately sure if introducing extra bookkeeping is the correct solution. Maybe the context information could be embedded in the documents themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or clear the context for a task when it finishes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearing on finish sounds like a reasonabe idea
# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger | ||
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have said Jaeger server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it won't be available till next year
- name: OTLP_EXPORT_ENABLED | ||
value: {{ .Values.tracing.otlp.export_enabled] }} | ||
- name: OTEL_EXPORTER_OTLP_TRACES_PROTOCOL | ||
value: {{ .Values.tracing.otlp.protocol }} | ||
- name: OTEL_EXPORTER_OTLP_ENDPOINT | ||
value: "{{ .Values.tracing.otlp.host }}:{{ .Values.tracing.otlp.port }}" | ||
- name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST | ||
value: {{ .Values.tracing.http.request.headers }} | ||
- name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE | ||
value: {{ .Values.tracing.http.response.headers }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add defaults in case these aren't set
@@ -118,6 +125,7 @@ def delete_environment(self) -> EnvironmentResponse: | |||
"/environment", EnvironmentResponse, method="DELETE" | |||
) | |||
|
|||
@start_as_current_span(TRACER, "method", "data", "suffix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How clear is it in the Jaeger UI when using a whole dict (data) to mark a span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get a screenshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty clear actually, I will try and add a screenshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -142,6 +158,7 @@ def get_device_by_name(name: str, runner: WorkerDispatcher = Depends(_runner)): | |||
response_model=TaskResponse, | |||
status_code=status.HTTP_201_CREATED, | |||
) | |||
@start_as_current_span(TRACER, "request", "task.name", "task.params") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How verbose is the request object in the jaeger UI and does it include e.g. headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is enabled by one of the environment variables
@@ -137,7 +165,7 @@ def _rpc( | |||
module_name: str, | |||
function_name: str, | |||
expected_type: type[T] | None, | |||
*args: Any, | |||
args: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we find it was impossible to pass *args? It looks like
(get_context_propagator(), *args),
-> *(get_context_propagator(), *args),
above would allow this to be unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied how it was done in your original example, will discuss when in
self._ctx.run_engine.abort(reason) | ||
add_span_attributes({"Task aborted": reason}) | ||
else: | ||
self._ctx.run_engine.stop() | ||
add_span_attributes({"Task stopped": reason}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these span attributes be added prior to calling abort/stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say no because then you're reporting something that hasn't yet happened. If they sai stopping rather than stopped then yes, but it seems more useful to know that the requested operation completed that might complete, to me anyway.
): | ||
task_started.set() | ||
|
||
LOGGER.info(f"Submitting: {trackable_task}") | ||
try: | ||
sub = self.worker_events.subscribe(mark_task_as_started) | ||
self._context_register[trackable_task.task_id] = get_trace_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or clear the context for a task when it finishes
@@ -186,42 +186,42 @@ class GenericModel(BaseModel, Generic[T]): | |||
b: str | |||
|
|||
|
|||
def return_int() -> int: | |||
def return_int(*args, **kwargs) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yuck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not my stupid programming language :)