Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tracing and rework cli so that the controller command does not start a FastAPI app #669

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

keithralphs
Copy link
Contributor

  • Add tracing intialisation and decorators
  • In cli.py move some imports within functions to prevent unnecessary FastAPI apps being created
  • Add capability to specify a config file in debug launchers
  • Add Environment vars to initialise traceability options and set Jager export to off by default

Copy link

codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 93.86503% with 10 lines in your changes missing coverage. Please review.

Project coverage is 92.53%. Comparing base (8b35390) to head (b9cb48c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/blueapi/cli/cli.py 45.45% 6 Missing ⚠️
src/blueapi/service/main.py 93.10% 2 Missing ⚠️
src/blueapi/worker/task_worker.py 96.72% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #669      +/-   ##
==========================================
- Coverage   92.62%   92.53%   -0.09%     
==========================================
  Files          35       35              
  Lines        1654     1795     +141     
==========================================
+ Hits         1532     1661     +129     
- Misses        122      134      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.devcontainer/devcontainer.json Show resolved Hide resolved
"serve"
]
"env": {
"OTLP_EXPORT_ENABLED": "false"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does exporting default to true? Would I also have to export OTLP_EXPORT_ENABLED=false if I wanted to run blueapi from the command line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, (assuming I understand the question), the default is false in the helm chart, dockerfile and here so you can just run BlueAPI and it wilkl not try to connect to Jaeger

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the env var is not set (like if running blueapi outside of a dev container), then it's treated like True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it?

Copy link
Contributor Author

@keithralphs keithralphs Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the code that interprest the Env Var (in core/init.py) is

OTLP_EXPORT_ENABLED = environ.get("OTLP_EXPORT_ENABLED") == "true"

and so will only enable export to Jaeger if the Env Var both exists and has been set to the string 'true'. This should however mean that there is no need to set this Env Var to 'false' as anything but 'true', including non-existance will result in the export being off, which is the desired default state. So I will remove the lines that explicitly do this.

"env": {
"OTLP_EXPORT_ENABLED": "false"
},
"args": "-c ${input:config_path} controller ${input:args}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should: Reconcile with #668 by @Relm-Arrowny

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @callumforrester meant #663

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add Ray's arg name change into mine so that they are consistent.

Copy link
Contributor Author

@keithralphs keithralphs Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@callumforrester do you think we should have an extra launcher for when the config file is specified or just make it the default that one is requested - you would have to press return to specify non but I guess that's fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite happy with the extra launcher, I don't need to specify a config file 80% of the time, so having to press enter for no reason 80% of the time is actually mildly annoying. I realise it's only mildly but the "just one little extra step isn't too bad" mentality is a slippery slope that eventually leads to GDA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated so there is a custom config option or both the serve and controller commands

Comment on lines +15 to +22
ENV OTLP_EXPORT_ENABLED=false
# enable opentelemetry support
ENV OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
# Ensure that all Http headers are captured
ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST=".*"
ENV OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE=".*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should: As discussed, we'd prefer not to set config in the Dockerfile and instead use the helm chart

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work for the dev container though as it uses the dockerfile to initialise? The config is set in the helm chart too but that doesn't help in VSCode?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the devcontainer you should be editing .devcontainer/devcontainer.json, which configures the base image into a devcontainer, I think you need to add something that looks a bit like this:

    "containerEnv": {
        "OTLP_EXPORT_ENABLED": false,
        etc
        etc
    },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, not setting the variable at all will result in Export being off, which is the desired default state, however the other vars are needed to configure tracing correctly, so that if export is enabled it can function. I guess therefore these should be moved into the devcontainer.json as you say, but where can I set them for a plain valnilla blueapi being run from the command line with no Dev Container or hekm chart involved?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at that point they should be passed in at container startup time

docker run -d --rm -e OLTP_FOO=true -e OLTP_BAR=false blueapi:latest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, in general, update the docs to reflect this PR. I don't mind if you do that here or make a separate issue

helm/blueapi/values.yaml Show resolved Hide resolved
src/blueapi/worker/task_worker.py Show resolved Hide resolved
@@ -32,7 +32,7 @@ def _mock_with_name(name: str) -> MagicMock:
return mock


def wrong_return_type() -> int:
def wrong_return_type(*args, **kwargs) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this is so tracing can inject a parameter for the function to ignore? I'm a little concerned that you have to do this, add *args, **kwargs for non-obvious reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I can't fully remember why this was needed as it was some weeks ago. I do remember though that the tests would not succeed (when they should do) without this change. It was probably to do with the examination of the funciton signature by the start_as_current_span decorator

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding unused, unnecessary parameters to satisfy some library somewhere is not good practice (or in more succinct terms: #669 (comment)), we should try to work out what is complaining and see if we can satisfy it another way. Try removing it and seeing why the tests fail, I'm happy to help out if it's a weird error message.

src/blueapi/worker/task_worker.py Show resolved Hide resolved
):
task_started.set()

LOGGER.info(f"Submitting: {trackable_task}")
try:
sub = self.worker_events.subscribe(mark_task_as_started)
self._context_register[trackable_task.task_id] = get_trace_context()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must: Context register does not ever appear to be cleared, so if blueapi is left running for weeks or months (quite likely) it will get quite big and cumbersome. This class is already complicated so I'm not immediately sure if introducing extra bookkeeping is the correct solution. Maybe the context information could be embedded in the documents themselves?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or clear the context for a task when it finishes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing on finish sounds like a reasonabe idea

src/blueapi/cli/cli.py Show resolved Hide resolved
Comment on lines +18 to +19
# Change this to point to Jaeger server before merging e.g. https://daq-services-jaeger
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have said Jaeger server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it won't be available till next year

Comment on lines +80 to +89
- name: OTLP_EXPORT_ENABLED
value: {{ .Values.tracing.otlp.export_enabled] }}
- name: OTEL_EXPORTER_OTLP_TRACES_PROTOCOL
value: {{ .Values.tracing.otlp.protocol }}
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "{{ .Values.tracing.otlp.host }}:{{ .Values.tracing.otlp.port }}"
- name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST
value: {{ .Values.tracing.http.request.headers }}
- name: OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE
value: {{ .Values.tracing.http.response.headers }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add defaults in case these aren't set

pyproject.toml Show resolved Hide resolved
@@ -118,6 +125,7 @@ def delete_environment(self) -> EnvironmentResponse:
"/environment", EnvironmentResponse, method="DELETE"
)

@start_as_current_span(TRACER, "method", "data", "suffix")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How clear is it in the Jaeger UI when using a whole dict (data) to mark a span?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get a screenshot?

Copy link
Contributor Author

@keithralphs keithralphs Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty clear actually, I will try and add a screenshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1
Screenshot from 2024-10-21 14-08-07

@@ -142,6 +158,7 @@ def get_device_by_name(name: str, runner: WorkerDispatcher = Depends(_runner)):
response_model=TaskResponse,
status_code=status.HTTP_201_CREATED,
)
@start_as_current_span(TRACER, "request", "task.name", "task.params")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How verbose is the request object in the jaeger UI and does it include e.g. headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is enabled by one of the environment variables

src/blueapi/service/main.py Show resolved Hide resolved
@@ -137,7 +165,7 @@ def _rpc(
module_name: str,
function_name: str,
expected_type: type[T] | None,
*args: Any,
args: Any,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we find it was impossible to pass *args? It looks like

(get_context_propagator(), *args), -> *(get_context_propagator(), *args),

above would allow this to be unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied how it was done in your original example, will discuss when in

Comment on lines 149 to +153
self._ctx.run_engine.abort(reason)
add_span_attributes({"Task aborted": reason})
else:
self._ctx.run_engine.stop()
add_span_attributes({"Task stopped": reason})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these span attributes be added prior to calling abort/stop?

Copy link
Contributor Author

@keithralphs keithralphs Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no because then you're reporting something that hasn't yet happened. If they sai stopping rather than stopped then yes, but it seems more useful to know that the requested operation completed that might complete, to me anyway.

src/blueapi/worker/task_worker.py Show resolved Hide resolved
):
task_started.set()

LOGGER.info(f"Submitting: {trackable_task}")
try:
sub = self.worker_events.subscribe(mark_task_as_started)
self._context_register[trackable_task.task_id] = get_trace_context()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or clear the context for a task when it finishes

@@ -186,42 +186,42 @@ class GenericModel(BaseModel, Generic[T]):
b: str


def return_int() -> int:
def return_int(*args, **kwargs) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yuck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not my stupid programming language :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants