Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] Integration/e2e tests installing the agent randomly fail #3154

Closed
AndersonQ opened this issue Jul 31, 2023 · 3 comments · Fixed by #4321
Closed

[Flaky Test] Integration/e2e tests installing the agent randomly fail #3154

AndersonQ opened this issue Jul 31, 2023 · 3 comments · Fixed by #4321
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@AndersonQ
Copy link
Member

AndersonQ commented Jul 31, 2023

Flaky Test

Notes

  • For some reason tests installing the agent sometimes get stuck with the agent never finishing fetching the 1st config from fleet-server.
  • It also happens for tests with the mock fleet-server, which leads to believe the issue is on the agent, its installation process or somehow on the host.
  • Something is preventing the agent finishing its enroll process or making the agent, during the install/enroll process, to start in standalone mode.
  • It also does not seem to be related to the usage of testify/suite with SetUp/TearDown as the ProxyURL test do not use it, and other failing tests use testify/suite.

Once I spotted this issue while running the tests on a Vagrant box, and I managed to gather the following:

The agent reported not enrolled into fleet, however it did call /enroll and the request succeeded.

in the test logs I can see the call to fleet through the proxy:

    fixture.go:344: >> running agent with: [/tmp/TestProxyURLTestEnrollProxyAndNoProxyInThePolicy3423221791/001/elastic-agent-8.10.0-SNAPSHOT-linux-arm64/elastic-agent install --force --insecure --non-interactive --proxy-url=http://localhost:36529 --url http://fleet.elastic.co --enrollment-token anythingWillDO]
    proxytest.go:73: [proxy-proxy-1] [935eef36-5695-4922-8031-4860b131bca9] STARTING - POST http://fleet.elastic.co/api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:36150
    proxytest.go:73: [proxy-proxy-1] original URL: http://fleet.elastic.co/api/fleet/agents/enroll?, new URL: http://localhost:42557/api/fleet/agents/enroll?
    fleetserver.go:93: [fleet-server] [d990a9f5-2893-4842-bda8-9571a3f3c3b3] STARTING - POST /api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:54496
    fleetserver.go:93: [fleet-server] [d990a9f5-2893-4842-bda8-9571a3f3c3b3] DONE 200 - POST /api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:54496
    proxytest.go:73: [proxy-proxy-1] [935eef36-5695-4922-8031-4860b131bca9] DONE 200 - POST http://localhost:42557/api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:36150

the agent stat.yaml:

components: []
fleet_message: Not enrolled into Fleet
fleet_state: 6
log_level: info
message: Running
state: 2 

computed-config.yaml:
fleet:
  enabled: true

local-config.yaml:
fleet:
  access_api_key: <REDACTED>
  agent:
    id: ""
  enabled: false
  host: localhost:5601
  protocol: http
  timeout: 10m0s

there is just one log file, from the agent startup, not from the install process, and it says the agent is managed locally:

{"log.level":"info","@timestamp":"2023-07-27T05:19:29.925Z","log.origin":{"file.name":"application/application.go","file.line":127},"message":"Parsed configuration and determined agent is managed locally","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

for this specific case, the agent started not enrolled in fleet. Which should not happen at all. The same might be happening for all instances when the test fails.

Stack Trace

    fixture.go:344: >> running agent with: [/tmp/TestEndpointSecurityNonDefaultBasePath3113814637/001/elastic-agent-8.10.0-SNAPSHOT-linux-x86_64/elastic-agent install --base-path /opt/not_default --force --non-interactive --url https://cacc5ec862434e62b0d7da00ce8bf43a.fleet.us-central1.gcp.qa.cld.elstc.co:443 --enrollment-token YWp4V25Za0JqSUxRTmU3REdObjI6eU1UZTd6dXVSSGVqRTZ1ZVZ3OEEtUQ==]
    endpoint_security_test.go:304: >>> Ran Enroll. Output: Installing in non-interactive mode.{"log.level":"info","@timestamp":"2023-07-28T16:30:47.578Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":478},"message":"Starting enrollment to URL: https://cacc5ec862434e62b0d7da00ce8bf43a.fleet.us-central1.gcp.qa.cld.elstc.co:443/","ecs.version":"1.6.0"}
        {"log.level":"info","@timestamp":"2023-07-28T16:30:48.941Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":274},"message":"Elastic Agent might not be running; unable to trigger restart","ecs.version":"1.6.0"}
        Successfully enrolled the Elastic Agent.
        Elastic Agent has been successfully installed.
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    tools.go:33: Agent status: updating
    endpoint_security_test.go:304: 
        	Error Trace:	/home/ubuntu/agent/pkg/testing/tools/tools.go:94
        	            				/home/ubuntu/agent/testing/integration/endpoint_security_test.go:304
        	Error:      	Condition never satisfied
        	Test:       	TestEndpointSecurityNonDefaultBasePath
        	Messages:   	Elastic Agent status is not online
    fixture.go:344: >> running agent with: [/opt/not_default/Elastic/Agent/elastic-agent uninstall --force]
    fixture.go:200: Extracting artifact elastic-agent-8.10.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestProxyURL_NoEnrollProxyAndProxyInThePolicy1306062574/001
    fixture.go:213: Completed extraction of artifact elastic-agent-8.10.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestProxyURL_NoEnrollProxyAndProxyInThePolicy1306062574/001
    fixture.go:511: Components were not modified from the fetched artifact
    proxy_url_test.go:262: fleet: http://localhost:37521, proxy1: http://localhost:36023, proxy2: http://localhost:38873
    fixture.go:344: >> running agent with: [/tmp/TestProxyURL_NoEnrollProxyAndProxyInThePolicy1306062574/001/elastic-agent-8.10.0-SNAPSHOT-linux-x86_64/elastic-agent install --force --insecure --non-interactive --url http://localhost:37521 --enrollment-token anythingWillDO]
    fleetserver.go:123: [fleet-server] [f2def644-6ad8-4ffb-a4b0-cf384fb1cc6a] STARTING - POST /api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:56468
    fleetserver.go:123: [fleet-server] [f2def644-6ad8-4ffb-a4b0-cf384fb1cc6a] DONE 200 - POST /api/fleet/agents/enroll? HTTP/1.1 127.0.0.1:56468
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent status --output json]
    proxy_url_test.go:281: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/proxy_url_test.go:395
        	            				/home/ubuntu/agent/testing/integration/proxy_url_test.go:281
        	Error:      	Condition never satisfied
        	Test:       	TestProxyURL_NoEnrollProxyAndProxyInThePolicy
        	Messages:   	want fleet state HEALTHY, got STARTING. agent status: {{    false} 0  [] 0 }
    proxy_url_test.go:281: [assertConnectedFleet] last error from agent status command: could not unmarshal agent status output: exit status 1
        invalid character 'E' looking for beginning of value
    proxy_url_test.go:284: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/proxy_url_test.go:284
        	Error:      	Condition never satisfied
        	Test:       	TestProxyURL_NoEnrollProxyAndProxyInThePolicy
    proxy_url_test.go:295: did not find requests to the proxy defined in the policy
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent uninstall --force]
    fixture.go:344: >> running agent with: [/opt/Elastic/Agent/elastic-agent uninstall --force]
--- FAIL: TestProxyURL_NoEnrollProxyAndProxyInThePolicy (634.56s)
@AndersonQ AndersonQ added flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team labels Jul 31, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented Aug 1, 2023

Going to skip these tests while we investigate the root cause: #3164

cmacknz added a commit that referenced this issue Aug 1, 2023
All of the proxy tests are currently flaky and fail regularly, see
#3154

Disabling them while we investigate a fix to unblock CI.
mergify bot pushed a commit that referenced this issue Aug 8, 2023
All of the proxy tests are currently flaky and fail regularly, see
#3154

Disabling them while we investigate a fix to unblock CI.

(cherry picked from commit f419a37)

# Conflicts:
#	testing/integration/proxy_url_test.go
AndersonQ pushed a commit that referenced this issue Aug 16, 2023
All of the proxy tests are currently flaky and fail regularly, see
#3154

Disabling them while we investigate a fix to unblock CI.
AndersonQ added a commit that referenced this issue Aug 16, 2023
…roxy-url integration test and disable tests (#3147)

* enhance mock fleet-server and add --proxy-url integration test  (#2834)

* enhance test fleet-server
Now a almost fully functional mock fleet-server can be instantiated with a single call to fleetservertest.NewServerWithHandlers. The only missing handlers are the upload handlers.
testing/integration/proxy_url_test.go works as example on how to use the new features of  the test fleet-server

* add proxytest.Proxy:
A naive proxy to be used on tests which allows to configure URL rewrites and check all the calls made to it.
check the tests on testing/integration/proxy_url_test.go to see how to use proxytest.Proxy.

* add integration tests for the fleet-server proxy
The tests cover defining a proxy through --proxy-url and in the policy and the correct priority is respected

* fix the Go version used on the Github workflows
now it reads the version from the .go-version and not from the go.mod anymore.

* increase the memory allocated to the elastic-agent vagrant box

* allow -SNAPHOT versions to be passed to define.NewFixture

* add more helper methods to testing.Fixture and Install accepts more flags

* add version.Agent, a exported constant with the agent version

(cherry picked from commit 732d7c0)

* Disable proxy tests while flakiness is addressed. (#3164)

All of the proxy tests are currently flaky and fail regularly, see
#3154

Disabling them while we investigate a fix to unblock CI.

(cherry picked from commit 08d24b9)
---------

Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
@AndersonQ
Copy link
Member Author

closed by #3240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants