Add Tracing and rework cli so that the controller command does not start a FastAPI app #669

keithralphs · 2024-10-14T14:11:46Z

Add tracing intialisation and decorators
In cli.py move some imports within functions to prevent unnecessary FastAPI apps being created
Add capability to specify a config file in debug launchers
Add Environment vars to initialise traceability options and set Jager export to off by default

codecov · 2024-10-14T14:15:05Z

Codecov Report

Attention: Patch coverage is 93.86503% with 10 lines in your changes missing coverage. Please review.

Project coverage is 92.53%. Comparing base (8b35390) to head (b9cb48c).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/blueapi/cli/cli.py	45.45%	6 Missing ⚠️
src/blueapi/service/main.py	93.10%	2 Missing ⚠️
src/blueapi/worker/task_worker.py	96.72%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #669      +/-   ##
==========================================
- Coverage   92.62%   92.53%   -0.09%     
==========================================
  Files          35       35              
  Lines        1654     1795     +141     
==========================================
+ Hits         1532     1661     +129     
- Misses        122      134      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.devcontainer/devcontainer.json

callumforrester · 2024-10-16T12:01:15Z

.vscode/launch.json

-                "serve"
-            ]
+            "env": {
+                "OTLP_EXPORT_ENABLED": "false"


Does exporting default to true? Would I also have to export OTLP_EXPORT_ENABLED=false if I wanted to run blueapi from the command line?

No, (assuming I understand the question), the default is false in the helm chart, dockerfile and here so you can just run BlueAPI and it wilkl not try to connect to Jaeger

But if the env var is not set (like if running blueapi outside of a dev container), then it's treated like True?

As discussed offline, the code that interprest the Env Var (in core/init.py) is

OTLP_EXPORT_ENABLED = environ.get("OTLP_EXPORT_ENABLED") == "true"

and so will only enable export to Jaeger if the Env Var both exists and has been set to the string 'true'. This should however mean that there is no need to set this Env Var to 'false' as anything but 'true', including non-existance will result in the export being off, which is the desired default state. So I will remove the lines that explicitly do this.

callumforrester · 2024-10-16T12:01:59Z

.vscode/launch.json

+            "env": {
+                "OTLP_EXPORT_ENABLED": "false"
+            },
+            "args": "-c ${input:config_path} controller ${input:args}"


Should: Reconcile with #668 by @Relm-Arrowny

I think @callumforrester meant #663

Will add Ray's arg name change into mine so that they are consistent.

@callumforrester do you think we should have an extra launcher for when the config file is specified or just make it the default that one is requested - you would have to press return to specify non but I guess that's fine?

I'm quite happy with the extra launcher, I don't need to specify a config file 80% of the time, so having to press enter for no reason 80% of the time is actually mildly annoying. I realise it's only mildly but the "just one little extra step isn't too bad" mentality is a slippery slope that eventually leads to GDA.

Updated so there is a custom config option or both the serve and controller commands

callumforrester · 2024-10-16T12:02:33Z

Dockerfile

+ENV OTLP_EXPORT_ENABLED=false
+# enable opentelemetry support
+ENV OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
+# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger
+ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
+# Ensure that all Http headers are captured
+ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*"
+ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*"


Should: As discussed, we'd prefer not to set config in the Dockerfile and instead use the helm chart

How does that work for the dev container though as it uses the dockerfile to initialise? The config is set in the helm chart too but that doesn't help in VSCode?

For the devcontainer you should be editing .devcontainer/devcontainer.json, which configures the base image into a devcontainer, I think you need to add something that looks a bit like this:

"containerEnv": { "OTLP_EXPORT_ENABLED": false, etc etc },

As noted above, not setting the variable at all will result in Export being off, which is the desired default state, however the other vars are needed to configure tracing correctly, so that if export is enabled it can function. I guess therefore these should be moved into the devcontainer.json as you say, but where can I set them for a plain valnilla blueapi being run from the command line with no Dev Container or hekm chart involved?

I think at that point they should be passed in at container startup time

docker run -d --rm -e OLTP_FOO=true -e OLTP_BAR=false blueapi:latest

We should, in general, update the docs to reflect this PR. I don't mind if you do that here or make a separate issue

helm/blueapi/values.yaml

src/blueapi/worker/task_worker.py

callumforrester · 2024-10-16T12:17:10Z

tests/unit_tests/core/fake_device_module.py

@@ -32,7 +32,7 @@ def _mock_with_name(name: str) -> MagicMock:
    return mock


-def wrong_return_type() -> int:
+def wrong_return_type(*args, **kwargs) -> int:


I presume this is so tracing can inject a parameter for the function to ignore? I'm a little concerned that you have to do this, add *args, **kwargs for non-obvious reasons.

I must admit I can't fully remember why this was needed as it was some weeks ago. I do remember though that the tests would not succeed (when they should do) without this change. It was probably to do with the examination of the funciton signature by the start_as_current_span decorator

I think adding unused, unnecessary parameters to satisfy some library somewhere is not good practice (or in more succinct terms: #669 (comment)), we should try to work out what is complaining and see if we can satisfy it another way. Try removing it and seeing why the tests fail, I'm happy to help out if it's a weird error message.

src/blueapi/worker/task_worker.py

callumforrester · 2024-10-16T13:07:14Z

src/blueapi/worker/task_worker.py

            ):
                task_started.set()

        LOGGER.info(f"Submitting: {trackable_task}")
        try:
            sub = self.worker_events.subscribe(mark_task_as_started)
+            self._context_register[trackable_task.task_id] = get_trace_context()


Must: Context register does not ever appear to be cleared, so if blueapi is left running for weeks or months (quite likely) it will get quite big and cumbersome. This class is already complicated so I'm not immediately sure if introducing extra bookkeeping is the correct solution. Maybe the context information could be embedded in the documents themselves?

or clear the context for a task when it finishes

Clearing on finish sounds like a reasonabe idea

src/blueapi/cli/cli.py

DiamondJoseph · 2024-10-17T11:33:37Z

Dockerfile

+# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger
+ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318


Do we have said Jaeger server?

No it won't be available till next year

DiamondJoseph · 2024-10-17T11:34:11Z

helm/blueapi/values.yaml

+  - name: OTLP_EXPORT_ENABLED
+    value: {{ .Values.tracing.otlp.export_enabled] }}
+  - name: OTEL_EXPORTER_OTLP_TRACES_PROTOCOL
+    value: {{ .Values.tracing.otlp.protocol }}
+  - name: OTEL_EXPORTER_OTLP_ENDPOINT
+    value: "{{ .Values.tracing.otlp.host }}:{{ .Values.tracing.otlp.port }}"
+  - name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST
+    value: {{ .Values.tracing.http.request.headers }}
+  - name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE
+    value: {{ .Values.tracing.http.response.headers }}


Add defaults in case these aren't set

pyproject.toml

DiamondJoseph · 2024-10-17T11:37:10Z

src/blueapi/client/rest.py

@@ -118,6 +125,7 @@ def delete_environment(self) -> EnvironmentResponse:
            "/environment", EnvironmentResponse, method="DELETE"
        )

+    @start_as_current_span(TRACER, "method", "data", "suffix")


How clear is it in the Jaeger UI when using a whole dict (data) to mark a span?

Can I get a screenshot?

It's pretty clear actually, I will try and add a screenshot.

DiamondJoseph · 2024-10-17T11:38:32Z

src/blueapi/service/main.py

@@ -142,6 +158,7 @@ def get_device_by_name(name: str, runner: WorkerDispatcher = Depends(_runner)):
    response_model=TaskResponse,
    status_code=status.HTTP_201_CREATED,
 )
+@start_as_current_span(TRACER, "request", "task.name", "task.params")


How verbose is the request object in the jaeger UI and does it include e.g. headers?

Yes, this is enabled by one of the environment variables

src/blueapi/service/main.py

DiamondJoseph · 2024-10-17T11:42:53Z

src/blueapi/service/runner.py

@@ -137,7 +165,7 @@ def _rpc(
    module_name: str,
    function_name: str,
    expected_type: type[T] | None,
-    *args: Any,
+    args: Any,


Did we find it was impossible to pass *args? It looks like

(get_context_propagator(), *args), -> *(get_context_propagator(), *args),

above would allow this to be unchanged.

I just copied how it was done in your original example, will discuss when in

DiamondJoseph · 2024-10-17T11:44:06Z

src/blueapi/worker/task_worker.py

            self._ctx.run_engine.abort(reason)
+            add_span_attributes({"Task aborted": reason})
        else:
            self._ctx.run_engine.stop()
+            add_span_attributes({"Task stopped": reason})


Should these span attributes be added prior to calling abort/stop?

I would say no because then you're reporting something that hasn't yet happened. If they sai stopping rather than stopped then yes, but it seems more useful to know that the requested operation completed that might complete, to me anyway.

src/blueapi/worker/task_worker.py

DiamondJoseph · 2024-10-17T12:01:56Z

src/blueapi/worker/task_worker.py

            ):
                task_started.set()

        LOGGER.info(f"Submitting: {trackable_task}")
        try:
            sub = self.worker_events.subscribe(mark_task_as_started)
+            self._context_register[trackable_task.task_id] = get_trace_context()


or clear the context for a task when it finishes

DiamondJoseph · 2024-10-17T12:04:01Z

tests/unit_tests/service/test_runner.py

@@ -186,42 +186,42 @@ class GenericModel(BaseModel, Generic[T]):
    b: str


-def return_int() -> int:
+def return_int(*args, **kwargs) -> int:


It's not my stupid programming language :)

keithralphs added 3 commits September 24, 2024 13:44

basic impl

e42c5c7

change sig of _rpc and add dependencies

3c7f0a6

Add tracing adjusting startup to prevent multiple FastAPI apps

b9cb48c

keithralphs requested review from callumforrester and DiamondJoseph October 15, 2024 09:04

callumforrester requested changes Oct 16, 2024

View reviewed changes

DiamondJoseph reviewed Oct 17, 2024

View reviewed changes

DiamondJoseph requested changes Oct 17, 2024

View reviewed changes

		# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger
		ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318

Add Tracing and rework cli so that the controller command does not start a FastAPI app #669

Are you sure you want to change the base?

Add Tracing and rework cli so that the controller command does not start a FastAPI app #669

Conversation

keithralphs commented Oct 14, 2024

codecov bot commented Oct 14, 2024

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithralphs Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithralphs Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithralphs Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithralphs Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keithralphs Oct 17, 2024 •

edited

Loading

keithralphs Oct 17, 2024 •

edited

Loading

keithralphs Oct 17, 2024 •

edited

Loading

keithralphs Oct 17, 2024 •

edited

Loading