-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccmlib/node: cleanup unsealed system sstables for offline tools #504
Conversation
see also 3616251 otherwise, `sstablelevelreset` would try to load the unsealed sstables and print out error messages in stderr if not all the components are available or is corrupted. this fails the test which checks for unexpected errors. Fixes scylladb/scylladb#15210 Signed-off-by: Kefu Chai <[email protected]>
tested using $ JAVA_HOME=/usr/lib/jvm/jre-11 pytest -c $PWD/pytest.ini --cassandra-dir $HOME/dev/scylladb/build/cmake offline_tools_test.py::TestOfflineTools::test_sstablelevelreset --count 20 previously 3 out 20 runs failed. after this change all run succeeded. |
system_keyspace_names = ['system_schema', 'system'] | ||
if keyspace not in system_keyspace_names: | ||
for system_keyspace_name in system_keyspace_names: | ||
self.get_sstables(system_keyspace_name, '', cleanup_unsealed=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why does ccm mess with unsealed sstables.
Maybe it makes sense for (older versions of?) Cassandra, but it doesn't seem to apply to Scylla.
It's obvious that this function was written with Cassandra in mind.
This other existing hunk doesn't apply to Scylla to either:
if os.path.exists(f.replace('Data.db', 'Compacted')):
files.remove(f)
I think that just ignore_unsealed=True, cleanup_unsealed=False
should do the right job (for a ScyllaNode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhalevy it's not directly related to ccm. it's just how sstablelevelreset
works. it tries to open sstables in system keyspaces. if we need to stick to sstablelevelreset
, we have to clean the unsealed sstables before inviting it to look at the data directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's obvious that this function was written with Cassandra in mind. This other existing hunk doesn't apply to Scylla to either:
if os.path.exists(f.replace('Data.db', 'Compacted')): files.remove(f)
we can drop this bit in another change.
I think that just
ignore_unsealed=True, cleanup_unsealed=False
should do the right job (for a ScyllaNode).
i don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why does the function even mess with system_schema if it was called for a user keyspace? I guess it is configured with LCS in Cassandra. I'm not sure if it applies to scylla.
- Touching unsealed sstables means that sstable components might be incomplete, how can we touch them? The caller should stop the node before doing this or ensure there will be no unsealed sstables in the table, e.g. by disabling compaction and running flush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can drop this bit in another change.
CCM should support both Cassandra and Scylla.
Since Scylla has no Compacted
component this logic won't apply to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the failure are in places where a flush call was done, but node is being stopped with cluster.stop(gently=False)
, so it can be kill during compaction.
should we disable compaction completely for this test ? and that's it ? i.e. for this tool it's should be part of the instructions (if it ever had one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these tools (be they C* or Scylla) work well when the node can compact while they are reading the sstables. Any test which wants to use these, should either make sure to disable compaction or stop the node when the tool is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So scylla is stopped, but not gently so it may have unsealed sstables that were in the processing of being written while the node was taken down.
It should actually be OK to remove all of them, as we do it anyhow on restart.
By mistake, I thought we're resetting the level for those unsealed sstables and sstables in the system keyspaces, but I see we only call self.get_sstables
to perform this cleanup of unsealed sstables, which is OK.
Not elegant at all, but should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, it is even confusing. i will create a follow-up change to explain the rationale in a comment in this function. but since it's failing in multiple places frequently. will leave it for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these tools (be they C* or Scylla) work well when the node can compact while they are reading the sstables. Any test which wants to use these, should either make sure to disable compaction or stop the node when the tool is running.
scylla is, well, stopped, in a harsh way.
@fruch hi Israel, are you fine with this change? probably we can merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
see also 3616251
otherwise,
sstablelevelreset
would try to load the unsealed sstables and print out error messages in stderr if not all the components are available or is corrupted. this fails the test which checks for unexpected errors.Fixes scylladb/scylladb#15210