Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake: //test/extensions/filters/http/ext_proc:ext_proc_integration_test #36041

Open
ravenblackx opened this issue Sep 9, 2024 · 7 comments

Comments

@ravenblackx
Copy link
Contributor

ravenblackx commented Sep 9, 2024

https://dev.azure.com/cncf/envoy/_build/results?buildId=179508&view=logs&j=4930ecaf-18f4-5b3c-dea3-309729c3b3ae&t=573d8780-d7b9-52e3-b4e0-a89886b0b9ff&l=3840

[ RUN      ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndCloseStreamWithTracing/IPv4_GoogleGrpc_WithDeferredProcessing
test/extensions/filters/http/ext_proc/tracer_test_filter.cc:52: Failure
Expected equality of these values:
  want
    Which is: "0"
  got
    Which is: ""
grpc.status_code: 0 not found in tags:
component: proxy
status: canceled
upstream_address: 127.0.0.1:37667
upstream_cluster: ext_proc_server_0

@tyxia

@tyxia
Copy link
Member

tyxia commented Sep 9, 2024

@cainelli Do you mind taking a look at this? Thank you!

I think tracing related feature was added by your change

@cainelli
Copy link
Contributor

cainelli commented Sep 9, 2024

oh sorry about that. I will take a look into it this week.

@cainelli
Copy link
Contributor

@tyxia the failure is a timeout and the tracing failure is a side effect. I don't see how such simple request would take more than 5s to run.

test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc:281: Failure
Value of: fake_upstreams_[0]->waitForHttpConnection(*dispatcher_, fake_upstream_connection_)
  Actual: false (Timed out waiting for new connection.)
Expected: true

....

test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc:277: Failure
Expected equality of these values:
  std::to_string(status_code)
    Which is: "200"
  response.headers().getStatusValue()
    Which is: "504"
Stack trace:
  0x1578018: (unknown)
  0x13cb8bc: (unknown)
  0x7fc368411a4d: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x7fc3683f822e: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x7fc3683dfb1d: testing::Test::Run()
  0x7fc3683e060e: testing::TestInfo::Run()
... Google Test internal frames ...

@cainelli
Copy link
Contributor

cainelli commented Sep 16, 2024

actually, is the timeout 5ms? should we increase it a bit?

result = fake_upstreams_[upstream_indices[upstream_index]]->waitForHttpConnection(
*dispatcher_, fake_upstream_connection, std::chrono::milliseconds(5));

update: nvm is 5s

void handleUpstreamRequest(bool add_content_length = false) {
ASSERT_TRUE(fake_upstreams_[0]->waitForHttpConnection(*dispatcher_, fake_upstream_connection_));

@alyssawilk
Copy link
Contributor

sorry about the dup but can you please have a look at the test? Flaky tests are problematic for everyone else trying to submit to Envoy!

@tyxia
Copy link
Member

tyxia commented Oct 9, 2024

@cainelli Thanks for spending effort reducing the flakiness.

In the past, we have test (as you linked in slack) that have larger then 5s timeout. However, that is because Please don't waitForHttpConnection with a 5s timeout if failure is expected. In your case, failure is not expected.

I am not sure if the flakiness is because tracing will take a bit more time but 5s should be sufficient here. Or maybe because ext_proc_integration test has grown very big now. Do you happen to know what is the flakiness rate?

@alyssawilk
Copy link
Contributor

if you can't repro the flake (per slack) one thing you can do is add a LogLevelSetter in that test such that CI logs more information when it flakes. then next time we see a failure you'll have more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants