path | title |
---|---|
/learnings/ops_kafka_microservices |
Learnings: Ops: Kafka Microservices |
If you turn on compaction on your topics it will / can use Snappy compaction. Snappy is a C library that is downloaded by (Kafka?) on startup.
Because it’s a C library it requires two things:
- A downloadable location that the current user can write to
- SELinux must be set to be permissive enough to allow the C library to load, or for the jar to write files to the disk?? I forget which, but there was a fun issue here I think.
Set download location
-Dorg.xerial.snappy.tempdir=/some/writable/dir
This location does not have to be unique if you are running multiple Kstreams apps in a single VM / container.
By default uses Java tmpdir. so can set that with a -D flag
-Djava.io.tmpdir=/someplace/this/can/write/to/and/is/unique/for/app/tmp
Use the properties API to set state.dir
to the directory you want.
Kafka Streams allows for stateful stream processing, i.e. operators that have an internal state. This internal state is managed in so-called state stores. A state store can be ephemeral (lost on failure) or fault-tolerant (restored after the failure). The default implementation used by Kafka Streams DSL is a fault-tolerant state store using 1. an internally created and compacted changelog topic (for fault-tolerance) and 2. one (or multiple) RocksDB instances (for cached key-value lookups). ... Kafka Streams commit the current processing progress in regular intervals (parameter commit.interval.ms). ...
TL;DR: KStream data is stored on disk, yes, but also a Kafka topic. Thus, in default mode, the local data store can go away (instance reboot) and nothing bad happens. As the Kafka topic is used to boot back up the local data store.
he AWS pricing model allocates a baseline read and write IOPS allowance to a volume, based on the volume size. Each volume also has an IOPS burst balance, to act as a buffer if the base limit is exceeded. Burst balance replenishes over time but as it gets used up the reading and writing to disks starts getting throttled to the baseline, leaving the application with an EBS that is very unresponsive.
From: Confluent: Running Kafka Streams on AWS
Answer: bigger disks - even though you don't need the storage - means more IOPS.
May have issues because of RockDB??
Consumer Groups written to internal topic. Each consumer in consumer group writes its current offset to this topic.
To reset offset - ie to rewind the consumer to the beginning of the group - requires no active members in the consumer group.
Q: How does Kafka know that there's an active consumer? A: The group coordinator gets heart beats every 2 seconds, if no heartbeats received then the consumer group doesn't work.
$ kafka-consumer-groups --bootstrap-server $KB --list
$ kafka-consumer-groups --bootstrap-server $KB --group demo-group --describe
$ $ kafka-consumer-groups --bootstrap-server $KB --group demo-group --reset-offsets --topic demo:3 --shift-by 1 --execute
$ kafka-consumer-groups --bootstrap-server $KB --group demo-group --reset-offsets --topic demo:6 --topic demo2:4 --shift-by -5 --execute