-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage for cdrs_tokio::transport::AsyncTransport::start_processing #161
Comments
|
I will redeploy my app to run with debug symbols and at debug log level and I will update the logs here once I have them. |
You don't have to run a debug build - just debug logs will be fine. The problem might be in any part of the application, maybe even not in cdrs, so it's important to get the logs from the exact moment it happens. |
I will run a release build itself but with debug symbols. |
I ran again and captured the flamegraph with same results as above. I finally decided to use another library named |
No it can't, since it's simply a read/write loop which waits for data. That's why you're seeing it on the flamegraph - it runs for the entire duration of the session, but doesn't actually do anything until data comes in or out. That's why you can see |
I ran with |
Are you using the latest version? Do you have any tracing subscriber, which outputs the logs, e.g. https://docs.rs/tracing-subscriber/latest/tracing_subscriber/fmt/index.html? |
Yes, (I usually perform
Yes, I use the above tracing_subscriber itself. I am able to get logs from hyper, h2 and other crates. I have been running the service for a long time and this has been happening all the time. I am able to get some logs from For now, I have moved to |
"Connection reset by peer" means the cluster shut the connection down. Can you verify two things:
|
Just had a thought - if you are using 8.0 and the nodes drop connections, the io spikes might be related to connection pools being re-established. Lowering pool size might be the solution. |
Now that I check, I was using |
It might help out of the box, but also remember you can fine tune many settings. If you still get those spikes, try lowering heartbeat interval and/or pool size. |
I'm facing similar issue wherein CPU usage is ~50% and I tried both lowering connection pooling (local: 1 and remote:0) and upgrading to 8.1.0. Both didn't help as of now. |
Do you have a flamegraph and/or debug logs? |
@krojew unfortunately I'm still working on flamegraph report. However I can assure you that once I take out cdrs-tokio I don't see the CPU usage raised after a while. Below is when cdrs-tokio is present and application is idle: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@krojew I can confirm we are still facing this issue, requesting to investigate or acknowledge its a WIP. |
Can you provide logs and flame graph if possible? |
I have setup a cassandra cluster of 6 pods using the helm charts and I am connecting to that cluster with one known node. The application works fine, but occasionally the cpu usage of my application would go full and the application becomes unresponsive. I sampled the cpu usage at one of the peak cpu usage for 1 second and found that 99% of cpu is being used in
cdrs_tokio::transport::AsyncTransport::start_processing
, I have attached the complete flamegraph SVG for your reference here. I also observe that after few hours of full cpu usage, the usage does comes down to 50% (but still high)The flamegraph was captured without debug symbols since It happens occasionally, below is the code that connect demonstrate how I make the connection, here CASSANDRA_HOST is the name of the service of K8 application.
The text was updated successfully, but these errors were encountered: