-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestLogIngestionFleetManaged/Monitoring_logs_are_shipped is flaky #3741
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Believe this new flakiness was introduced in #3465 |
Reverting the change while we investigate what the failure is #3743 |
Another failure https://buildkite.com/elastic/elastic-agent/builds/7991#018e803c-05c8-4470-9091-0013cd21917f This time it's in the new code introduced by the fix #3765 |
Looking at the logs it looks like the {
"log.level": "info",
"@timestamp": "2024-03-27T14:50:08.758Z",
"message": "Non-zero metrics in the last 30s",
"component": {
"binary": "filebeat",
"dataset": "elastic_agent.filebeat",
"id": "filestream-monitoring",
"type": "filestream"
},
"log": {
"source": "filestream-monitoring"
},
"log.logger": "monitoring",
"log.origin": {
"file.line": 187,
"file.name": "log/log.go",
"function": "github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"
},
"service.name": "filebeat",
"monitoring": {
"ecs.version": "1.6.0",
"metrics": {
"beat": {
"cpu": {
"system": {
"ticks": 250,
"time": {
"ms": 47
}
},
"total": {
"ticks": 609,
"time": {
"ms": 125
},
"value": 609
},
"user": {
"ticks": 359,
"time": {
"ms": 78
}
}
},
"info": {
"ephemeral_id": "d50a0fbb-c0b8-4697-8a6f-bd24a1146e0d",
"uptime": {
"ms": 210130
},
"version": "8.14.0"
},
"memstats": {
"gc_next": 41174472,
"memory_alloc": 23165520,
"memory_total": 77048088,
"rss": 82186240
},
"runtime": {
"goroutines": 61
}
},
"filebeat": {
"events": {
"active": 50,
"added": 53,
"done": 3
},
"harvester": {
"open_files": 4,
"running": 4
}
},
"libbeat": {
"config": {
"module": {
"running": 1
}
},
"output": {
"events": {
"active": 0
},
"write": {
"latency": {
"histogram": {
"count": 1,
"max": 1483,
"mean": 1483,
"median": 1483,
"min": 1483,
"p75": 1483,
"p95": 1483,
"p99": 1483,
"p999": 1483,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 4,
"events": {
"active": 44,
"filtered": 9,
"published": 44,
"total": 53
}
}
},
"registrar": {
"states": {
"current": 0
}
},
"system": {
"handles": {
"open": -2
}
}
}
},
"ecs.version": "1.6.0"
} |
Nevermind, there was an earlier point where the writes happened: {
"log.level": "info",
"@timestamp": "2024-03-27T14:47:08.765Z",
"message": "Non-zero metrics in the last 30s",
"component": {
"binary": "filebeat",
"dataset": "elastic_agent.filebeat",
"id": "filestream-monitoring",
"type": "filestream"
},
"log": {
"source": "filestream-monitoring"
},
"service.name": "filebeat",
"monitoring": {
"ecs.version": "1.6.0",
"metrics": {
"beat": {
"cpu": {
"system": {
"ticks": 187,
"time": {
"ms": 187
}
},
"total": {
"ticks": 421,
"time": {
"ms": 421
},
"value": 421
},
"user": {
"ticks": 234,
"time": {
"ms": 234
}
}
},
"info": {
"ephemeral_id": "d50a0fbb-c0b8-4697-8a6f-bd24a1146e0d",
"name": "filebeat",
"uptime": {
"ms": 30135
},
"version": "8.14.0"
},
"memstats": {
"gc_next": 41157728,
"memory_alloc": 20457360,
"memory_sys": 49289496,
"memory_total": 70755320,
"rss": 92114944
},
"runtime": {
"goroutines": 61
}
},
"filebeat": {
"events": {
"active": 0,
"added": 216,
"done": 216
},
"harvester": {
"open_files": 4,
"running": 4,
"started": 4
}
},
"libbeat": {
"config": {
"module": {
"running": 1,
"starts": 1
}
},
"output": {
"events": {
"acked": 57,
"active": 0,
"batches": 1,
"total": 57
},
"read": {
"bytes": 4344,
"errors": 1
},
"type": "elasticsearch",
"write": {
"bytes": 12790,
"latency": {
"histogram": {
"count": 1,
"max": 1483,
"mean": 1483,
"median": 1483,
"min": 1483,
"p75": 1483,
"p95": 1483,
"p99": 1483,
"p999": 1483,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 4,
"events": {
"active": 0,
"filtered": 159,
"published": 57,
"retry": 57,
"total": 216
},
"queue": {
"acked": 57,
"max_events": 3200
}
}
},
"registrar": {
"states": {
"current": 0
}
},
"system": {
"cpu": {
"cores": 4
},
"handles": {
"open": 330
}
}
}
},
"log.logger": "monitoring",
"log.origin": {
"file.line": 187,
"file.name": "log/log.go",
"function": "github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"
},
"ecs.version": "1.6.0"
} |
Another failure on |
That's pretty odd, on both cases the logs stop before the Beats finalise starting up, there are not shutdown or failure logs, it looks like the Elastic-Agent process is brutally killed. I'll try to reproduce it. |
Lots of failures in https://buildkite.com/elastic/elastic-agent/builds/8549#018f2728-f289-4c37-9d0b-c922580c3e80
|
This seems to be a new error from the
I created a PR to not consider it an error any more: #4632 |
@belimawr the team who maintains the processor suggested us to add this to our CI scripts:
So, the processor does not get confused. However, it would most likely require another change in the test though. They also told me the message
is safe to ignore. |
This failure is still happening, e.g. https://buildkite.com/elastic/elastic-agent-extended-testing/builds/2544#0191c7da-22fd-4113-aa2f-e64740ff7ebb (from #5450). Reopening... |
The most recent failure is We should never be trying to connect to localhost. |
We can add that error to the ignore list as a quick way to mitigate this. |
Looking at #5450, it does not contain the fix commit 6695324, so I believe we can close this issue.
It happens on the |
Flaky Test
elastic-agent/testing/integration/logs_ingestion_test.go
Line 39 in f7dcbd7
Stack Trace
The text was updated successfully, but these errors were encountered: