Improve indexing performance for Cassandra event log #361

DavidTegtmeyer · 2016-11-11T16:43:33Z

When persisting large amounts of events too fast the index performance for the Cassandra event log seems unsufficient. In our case 5000000 events were persisted in batches of 1024. While 5000000 were actually persisted in corresponding cassandra table the event log took way more than 10m to recover its state, causing LoadSnaphot requests to timeout.

kongo2002 · 2016-11-25T09:45:27Z

Hi guys,

I'd like to add some more detail to this issue.

According to my observation the index updating mechanism can indeed fall behind the actual event persistance. This problem gets more evident as soon as you increase the write-batch size with a high write load (as mentioned by @DavidTegtmeyer already). In that case the index updating which effectively consists of a read-write-loop on the event log itself can't catch up with the events written in the meantime.

This could be acceptable in some cases but it gets more problematic when the index updating can't catch up at all. As soon as only one read or write of the index updating fails, the whole index updater restarts from the beginning (= the last time the index updater finished completely).

I kind of reduced the impact of this problem by propagating the UpdateIndexProgress of the CassandraIndexUpdater to the CassandraIndex in order to update its internal clock progress.
This doesn't actually fix the root cause of the problem but it at least allows such an event log to 'survive' under these conditions. Otherwise event sourced views using that event log wouldn't be able to recover in time at all due to an outdated eventlog clock snapshot.

@krasserm I am happy to make a PR out of this 'fix' if it makes any sense :-)

Cheers,
Gregor

krasserm · 2016-11-28T09:57:22Z

One reason for rather slow index writes is that events are (redundantly) written to the aggregateId index. This will be optimized by just writing the main log sequence numbers to the index. During an aggregateId-scoped batch replay, sequence numbers for the current batch are read from the index in a first round. In a second round, the events themselves are read from their primary storage location (from multiple partitions concurrently if applicable). Along with this optimization, the frequency for index progress write should also be made configurable to avoid unnecessary re-indexing from rather old clock snapshots.

kongo2002 · 2017-02-21T07:56:58Z

Hi @krasserm ,

I would like to add some more information to this issue. The workaround we introduced with #368 helps only as long as the eventlog doesn't use aggregateIds. However we do have eventlogs with aggregates as well where this approach doesn't apply. I did some more testing for that reason and I think the main problem is not the write performance of the index but rather the read "loop" that is used by the CassandraIndex.

I have modified the CassandraEventLog to remove the CassandraIndex altogether but instead write the event's aggregates together with the "original" event itself. The initial tests are very promising so far.

Any thoughts and/or interested in a PR? :-)

Cheers,
Gregor

krasserm · 2017-02-21T08:37:05Z

I think the main problem is not the write performance of the index but rather the read "loop" that is used by the CassandraIndex

Can you please elaborate?

... but instead write the event's aggregates together with the "original" event itself.

Can you please further explain, not sure I follow ...

The initial tests are very promising so far.

Awesome 😃

kongo2002 · 2017-02-21T08:59:22Z

I think the read operations that is part of the indexing has the main impact on performance rather than the write operation(s). The indexer consumes the event log for a specified interval (fromSequenceNr, toSequenceNr), collects the aggregates for those events, calculates the event log clock and finally writes those aggregate events into the cassandra (CassandraIndex.updateAsync). This process is repeated for every UpdateIndex command the index receives from the event log. On high event loads this read-write-"loop" cannot catch up with the writes of the "original" events itself (CassandraEventLog.writeBatch) which leads to the indexer constantly querying the event log table.
My approach is to write those aggregate events that are now (asynchronously) read and processed by the CassandraIndex immediately in writeBatch instead. That way I get rid of the additional reads of the indexer and I can "reuse" the event payload when writing into the aggregate table as well. Of course this may increase the runtime of writeBatch a bit. According to my tests (and my understanding of cassandra write performance) this is merely measurable so far.

krasserm · 2017-02-21T09:14:37Z

Your approach will introduce cross-partition batch writes which are no longer isolated, see #73 and krasserm/akka-persistence-cassandra#48. Isolation is needed for sequential read consistency.

kongo2002 · 2017-02-21T09:25:10Z

Not sure if I understand correctly but I do write the aggregate-events in CassandraEventLog.writeBatch but not as part of the same cassandra-batch.

krasserm · 2017-02-21T09:26:57Z

How do you recover from write failures?

krasserm · 2017-02-21T09:28:53Z

... more precisely, how do you recover from partial write failures?

kongo2002 · 2017-02-21T09:30:54Z

Write failures are handled just as they are handled right now via writeRetry meaning if the aggregate write fails (or any of it) writeBatch is retried.

krasserm · 2017-02-21T11:24:36Z

Ok but we still might end up in a situation where events in the aggregate index are visible before they are visible in the main log. This breaks existing guarantees of Eventuate. This can only be avoided by writing the aggregate events after the main log has been successfully written which significantly reduces throughput.

krasserm · 2017-02-21T16:43:54Z

OTOH turning off the indexer also decreases load on Cassandra (no read load anymore) and your changes would also significantly decrease complexity of the Cassandra plugin. Please create a PR, looking forward to review it. Thanks!

krasserm added the storage label Nov 12, 2016

krasserm added ready and removed ready labels Nov 28, 2016

krasserm modified the milestone: 0.9 Nov 28, 2016

krasserm added the ready label Nov 28, 2016

krasserm removed the ready label Jan 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve indexing performance for Cassandra event log #361

Improve indexing performance for Cassandra event log #361

DavidTegtmeyer commented Nov 11, 2016 •

edited by krasserm

Loading

kongo2002 commented Nov 25, 2016

krasserm commented Nov 28, 2016

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

krasserm commented Feb 21, 2017

Improve indexing performance for Cassandra event log #361

Improve indexing performance for Cassandra event log #361

Comments

DavidTegtmeyer commented Nov 11, 2016 • edited by krasserm Loading

kongo2002 commented Nov 25, 2016

krasserm commented Nov 28, 2016

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

krasserm commented Feb 21, 2017

kongo2002 commented Feb 21, 2017

krasserm commented Feb 21, 2017

krasserm commented Feb 21, 2017

DavidTegtmeyer commented Nov 11, 2016 •

edited by krasserm

Loading