You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are seeing an issue in production (AWS Container) where the offsets are not being updated between runs. It seems that we start up and we slowly run out of memory (after about 15 minutes for 2G, 60 minutes for 8G) until the app crashes and a new instance restarts with the same begin_offset. I cannot seem to reproduce the problem in development.
Something that is possibly related is a rebalance occurs every some multiple of 15 seconds:
... 226 lines omitted ...
{"environment":"release","level":"notice","message":"Group member (portfolio-monitor-consumer-group,coor=#PID<0.2720.0>,cb=#PID<0.2715.0>,generation=104232):\nre-joining group, reason::rebalance_in_progress","payload":{},"timestamp":"2021-01-31T14:33:52.857406Z"}
{"application":"kaffe","environment":"release","level":"info","message":"event#assignments_revoked=Elixir.Kaffe.GroupMember.portfolio-monitor-consumer-group.release_tradelines","payload":{},"timestamp":"2021-01-31T14:33:52.857583Z"}
... 3 lines omitted ...
{"environment":"release","level":"notice","message":"Group member (portfolio-monitor-consumer-group,coor=#PID<0.2701.0>,cb=#PID<0.2416.0>,generation=104232):\nre-joining group, reason::rebalance_in_progress","payload":{},"timestamp":"2021-01-31T14:33:52.860172Z"}
... 13 lines omitted ...
{"environment":"release","level":"notice","message":"Group member (portfolio-monitor-consumer-group,coor=#PID<0.2720.0>,cb=#PID<0.2715.0>,generation=104233):\nassignments received:\n release_tradelines:\n partition=0 begin_offset=1036850","payload":{},"timestamp":"2021-01-31T14:33:52.865159Z"}
{"application":"kaffe","environment":"release","level":"info","message":"event#assignments_received=Elixir.Kaffe.GroupMember.portfolio-monitor-consumer-group.release_tradelines generation_id=104233","payload":{},"timestamp":"2021-01-31T14:33:52.865255Z"}
Show all 257 lines
... 56 lines omitted ...
{"environment":"release","level":"notice","message":"Group member (portfolio-monitor-consumer-group,coor=#PID<0.2720.0>,cb=#PID<0.2715.0>,generation=104230):\nre-joining group, reason::rebalance_in_progress","payload":{},"timestamp":"2021-01-31T14:33:22.851670Z"}
... 4 lines omitted ...
{"environment":"release","level":"notice","message":"Group member (portfolio-monitor-consumer-group,coor=#PID<0.2701.0>,cb=#PID<0.2416.0>,generation=104230):\nre-joining group, reason::rebalance_in_progress","payload":{},"timestamp":"2021-01-31T14:33:22.854326Z"}
{"application":"kaffe","environment":"release","level":"info","message":"event#assignments_revoked=Elixir.Kaffe.GroupMember.portfolio-monitor-consumer-group.release_inquiries","payload":{},"timestamp":"2021-01-31T14:33:22.854427Z"}
... 12 lines omitted ...
{"application":"kaffe","environment":"release","level":"info","message":"event#assignments_received=Elixir.Kaffe.GroupMember.portfolio-monitor-consumer-group.release_inquiries generation_id=104231","payload":{},"timestamp":"2021-01-31T14:33:22.858850Z"}
{"application":"kaffe","environment":"release","level":"info","message":"event#assignments_received=Elixir.Kaffe.GroupMember.portfolio-monitor-consumer-group.release_tradelines generation_id=104231","payload":{},"timestamp":"2021-01-31T14:33:22.858933Z"}
Not much is going on in my handle_messages function:
def handle_messages(messages) do
Logger.info("kafka.handle_messages begin")
for message <- messages do
Logger.info("Incoming kafka message", message)
... handling message here, not much going here except a select that returns in a couple ms and doesnt return any rows
end
Logger.info("kafka.handle_messages end")
:ok
end
The following message occurs around 300 times/second with offset incrementing by 1 each time until the end message occurs around every 45 seconds or so. Messages are backed up from 10 days ago (the processed_at timestamp)
begin_offset is always the same for startup. The first message only occurs on startup while the next 2 occur on startup and after every rebalance. The begin_offset is always the same value even when a new container starts up
Running kaffe 1.18.0 and brod 3.14.0. I tried to upgrade to the latest and ran into this (#106). Here is the config where there are 3 brokers and 2 topics:
config :kaffe,
consumer: [
endpoints: brokers,
topics: topics,
ssl: config_env() == :prod,
# the consumer group for tracking offsets in Kafka
consumer_group: "portfolio-monitor-consumer-group",
# the module that will process messages
message_handler: EventHandler.KafkaConsumer,
offset_reset_policy: :reset_to_latest,
worker_allocation_strategy: :worker_per_topic_partition
]
Sorry if this is TMI but I didn't want to leave out anything that might be relevant. Any help resolving this issue would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
There were some issues with continual rebalancing in a "recent" version of Kafka ... but I don't recall which version. The rebalancing could definitely cause messages to be continually reprocessed so that the consumers never commit and make progress.
I think there's an old issue here that might contain that Kafka version ...
I dropped the max_bytes down from 1M to 10K. We now commit several times per second instead of once every 45 seconds. The rebalance is still occurring every 15 seconds or so but the begin_offset is increasing each time whereas it was holding steady previously. Memory is also holding steady.
I think the rebalancing is the issue to solve first. That should not be happening with such a high and consistent frequency. I wonder if you'd experience the same thing with a different cluster (like set one up in a local docker or something).
We are seeing an issue in production (AWS Container) where the offsets are not being updated between runs. It seems that we start up and we slowly run out of memory (after about 15 minutes for 2G, 60 minutes for 8G) until the app crashes and a new instance restarts with the same begin_offset. I cannot seem to reproduce the problem in development.
Something that is possibly related is a rebalance occurs every some multiple of 15 seconds:
Not much is going on in my handle_messages function:
Which results in:
The following message occurs around 300 times/second with offset incrementing by 1 each time until the end message occurs around every 45 seconds or so. Messages are backed up from 10 days ago (the processed_at timestamp)
begin_offset is always the same for startup. The first message only occurs on startup while the next 2 occur on startup and after every rebalance. The begin_offset is always the same value even when a new container starts up
Here is a 15 second cycle of kaffe specific messages:
Running kaffe 1.18.0 and brod 3.14.0. I tried to upgrade to the latest and ran into this (#106). Here is the config where there are 3 brokers and 2 topics:
Sorry if this is TMI but I didn't want to leave out anything that might be relevant. Any help resolving this issue would be greatly appreciated!
The text was updated successfully, but these errors were encountered: