Flaky test: port is already allocated #1917
-
Hi, In telegraf, we make heavy use of test-containers to do integration testing of telegraf + various services. In the last couple of months we have found we have a very flaky test that results in the same error message: Error: Received unexpected error:
container failed to start: Error response from daemon: driver failed programming external connectivity on endpoint peaceful_germain (8fe4467b369aab776e5f70add82cb20981fd7bea81a6cafc748f09406a9d1194): Bind for 0.0.0.0:9092 failed: port is already allocated: failed to start container
Test: TestConnectAndWriteIntegration
Messages: failed to start container Here is the specific test that is flaky, and here is how we start up containers. It only seems to be reproducible in CircleCI, which makes me think it is load related. However, this test case uses its own network, so I am confused as to why it would say the port is already allocated. Is there something obvious I am missing? Or a way to further debug this situation? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi @powersj thanks for reaching to us! The first thing I see is that the container request is using fixed ports, which means that even though running in its own network, the 9092 port is published to the host. Please see https://github.com/influxdata/telegraf/blob/d0aaabb4cafca1459dfc0f0dd37095fad56352ea/plugins/outputs/kafka/kafka_test.go#L50 - ExposedPorts: []string{"9092:9092", "9093:9093"},
+ ExposedPorts: []string{"9092", "9093"}, So I see plausible that at some point the port is allocated in the CI worker. I wonder if you are using any of those fixed ports to connect from the host. If so, you can leverage the APIs to get the MappedPort or the Endpoint of the container. |
Beta Was this translation helpful? Give feedback.
-
Hi @mdelapenya, Thank you for the response!
I now remember why we did this. The kafka client will reach out to the kafka broker to get metadata about the cluster. Our client uses the metadata to determine what the broker leader address and port are. While the first step of getting metadata is fine using the random mapped port, the metadata will turn around and say use port 9092 for the leader. Kafka does not know about the mapped ports, which will then cause our client fail to connect if we did not use the fixed port. I think what is happening is we have a second kafka test, for our kafka input, that also maps port 9092 to the host for this same reason. The result is that we can sometimes have a conflict, where one test might still be running. Is there a better way to set up networking such that our test and container are all in the same network and I could specify specific ports, without mapping? Thank you again! |
Beta Was this translation helpful? Give feedback.
-
@powersj I'm resolving this discussion, as we found a way to identify the flakiness, which came from the fixed ports and the kafka+zookeeper setup. Thanks for your contributions to Testcontainers Go! |
Beta Was this translation helpful? Give feedback.
I'm back-channeling the kafka issue to our kafka gurus (not me 😢 ). In the meantime, have you taken a look at the Kafka Kraft module? https://golang.testcontainers.org/modules/kafka/