fixed the alarm event handler performance issue #3136
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Kay @kasemir and Kunal @shroffk
While intensively testing the alarm performance, we noticed that the current alarm server cannot handle simultaneous alarm events (50 alarms with the exact process record time with the reference PV's
TESL
of 50 PVs per second during 10 minutes).Our suspicion was
Flowable<VType> buffersize
, but increasing the size is not a real solution. While debugging the performance problem, we noticed that the alarm server uses theproducer.send(record).get()
function intensively without returning its metadata insendStateUpdate()
inServerModel.java
.We have investigated this
get
function in more detail and concluded that it is the main trouble of alarm performance. This function issynchronous
andblocking
. Andsend
itself isasynchronous
andnon-blocking
.So here we propose the code without the
get
function, which has been verified that the current (this branch) alarm server can handle simultaneous alarm events (now 100 alarms with the exact process record time with the reference PV'sTESL
- of 50 PVs per second during 10 minutes).We have seen that there is no use case for handling the return value of the
get
function in other code, so we have removed all these functions everywhere. This may not be the solution if you want to receive some metadata after.send.get
.In this case, where you want to handle the metadata, we also find the solution, which is as follows:
I don't check the Kafka version, so you can find the appropriate version of Kafka in any version documentation.
Detailed information can be found at
https://kafka.apache.org/35/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord,org.apache.kafka.clients.producer.Callback)
We now extend our tests to "500, 1000, and 5000 alarm events" to see where the story ends. If all goes well, we finish our test for MAX 5000 alarms with the exact process record time with the reference PV's
TESL
of 50 PVs per second for 10 minutes.So we want to be very clear about the limits of the alarm services we want to use.
If we have the system more than 5000 alarm events per second, we will be completely wrong for the control system anyway. 🥲
@jeonghanlee, Soo Ryu, and @Sangil-Lee at ALS-U Controls, LBNL.