Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Improve indexing performance for Cassandra event log #361

Open
DavidTegtmeyer opened this issue Nov 11, 2016 · 12 comments
Open

Improve indexing performance for Cassandra event log #361

DavidTegtmeyer opened this issue Nov 11, 2016 · 12 comments
Labels

Comments

@DavidTegtmeyer
Copy link

DavidTegtmeyer commented Nov 11, 2016

When persisting large amounts of events too fast the index performance for the Cassandra event log seems unsufficient. In our case 5000000 events were persisted in batches of 1024. While 5000000 were actually persisted in corresponding cassandra table the event log took way more than 10m to recover its state, causing LoadSnaphot requests to timeout.

@kongo2002
Copy link
Contributor

Hi guys,

I'd like to add some more detail to this issue.

According to my observation the index updating mechanism can indeed fall behind the actual event persistance. This problem gets more evident as soon as you increase the write-batch size with a high write load (as mentioned by @DavidTegtmeyer already). In that case the index updating which effectively consists of a read-write-loop on the event log itself can't catch up with the events written in the meantime.

This could be acceptable in some cases but it gets more problematic when the index updating can't catch up at all. As soon as only one read or write of the index updating fails, the whole index updater restarts from the beginning (= the last time the index updater finished completely).

I kind of reduced the impact of this problem by propagating the UpdateIndexProgress of the CassandraIndexUpdater to the CassandraIndex in order to update its internal clock progress.
This doesn't actually fix the root cause of the problem but it at least allows such an event log to 'survive' under these conditions. Otherwise event sourced views using that event log wouldn't be able to recover in time at all due to an outdated eventlog clock snapshot.

@krasserm I am happy to make a PR out of this 'fix' if it makes any sense :-)

Cheers,
Gregor

@krasserm krasserm added ready and removed ready labels Nov 28, 2016
@krasserm
Copy link
Contributor

One reason for rather slow index writes is that events are (redundantly) written to the aggregateId index. This will be optimized by just writing the main log sequence numbers to the index. During an aggregateId-scoped batch replay, sequence numbers for the current batch are read from the index in a first round. In a second round, the events themselves are read from their primary storage location (from multiple partitions concurrently if applicable). Along with this optimization, the frequency for index progress write should also be made configurable to avoid unnecessary re-indexing from rather old clock snapshots.

@krasserm krasserm modified the milestone: 0.9 Nov 28, 2016
@krasserm krasserm added the ready label Nov 28, 2016
@krasserm krasserm removed the ready label Jan 10, 2017
@kongo2002
Copy link
Contributor

Hi @krasserm ,

I would like to add some more information to this issue. The workaround we introduced with #368 helps only as long as the eventlog doesn't use aggregateIds. However we do have eventlogs with aggregates as well where this approach doesn't apply. I did some more testing for that reason and I think the main problem is not the write performance of the index but rather the read "loop" that is used by the CassandraIndex.

I have modified the CassandraEventLog to remove the CassandraIndex altogether but instead write the event's aggregates together with the "original" event itself. The initial tests are very promising so far.

Any thoughts and/or interested in a PR? :-)

Cheers,
Gregor

@krasserm
Copy link
Contributor

I think the main problem is not the write performance of the index but rather the read "loop" that is used by the CassandraIndex

Can you please elaborate?

... but instead write the event's aggregates together with the "original" event itself.

Can you please further explain, not sure I follow ...

The initial tests are very promising so far.

Awesome 😃

@kongo2002
Copy link
Contributor

  • I think the read operations that is part of the indexing has the main impact on performance rather than the write operation(s). The indexer consumes the event log for a specified interval (fromSequenceNr, toSequenceNr), collects the aggregates for those events, calculates the event log clock and finally writes those aggregate events into the cassandra (CassandraIndex.updateAsync). This process is repeated for every UpdateIndex command the index receives from the event log. On high event loads this read-write-"loop" cannot catch up with the writes of the "original" events itself (CassandraEventLog.writeBatch) which leads to the indexer constantly querying the event log table.

  • My approach is to write those aggregate events that are now (asynchronously) read and processed by the CassandraIndex immediately in writeBatch instead. That way I get rid of the additional reads of the indexer and I can "reuse" the event payload when writing into the aggregate table as well. Of course this may increase the runtime of writeBatch a bit. According to my tests (and my understanding of cassandra write performance) this is merely measurable so far.

@krasserm
Copy link
Contributor

Your approach will introduce cross-partition batch writes which are no longer isolated, see #73 and krasserm/akka-persistence-cassandra#48. Isolation is needed for sequential read consistency.

@kongo2002
Copy link
Contributor

Not sure if I understand correctly but I do write the aggregate-events in CassandraEventLog.writeBatch but not as part of the same cassandra-batch.

@krasserm
Copy link
Contributor

How do you recover from write failures?

@krasserm
Copy link
Contributor

... more precisely, how do you recover from partial write failures?

@kongo2002
Copy link
Contributor

Write failures are handled just as they are handled right now via writeRetry meaning if the aggregate write fails (or any of it) writeBatch is retried.

@krasserm
Copy link
Contributor

Ok but we still might end up in a situation where events in the aggregate index are visible before they are visible in the main log. This breaks existing guarantees of Eventuate. This can only be avoided by writing the aggregate events after the main log has been successfully written which significantly reduces throughput.

@krasserm
Copy link
Contributor

OTOH turning off the indexer also decreases load on Cassandra (no read load anymore) and your changes would also significantly decrease complexity of the Cassandra plugin. Please create a PR, looking forward to review it. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants