Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla_node: no need to _wait_no_pending_flushes #500

Merged
merged 1 commit into from
Aug 27, 2023

Conversation

bhalevy
Copy link
Member

@bhalevy bhalevy commented Aug 27, 2023

The flush api is synchronous and so it waits for
the respective flush to get done and will return
and error if flush failed for any reason and
this will propagate through nodetool flush.

_wait_no_pending_flushes may fail if there
are spontanuous flushes, and it also tends to
timeout in nodetool cfstats in debug mode,
see for example:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/250/testReport/alternator_ttl_tests/TestAlternatorTTL/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split029___test_ttl_with_load_and_decommission/

../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1320: in flush
    self._wait_no_pending_flushes(verbose=kwargs.get('verbose', True))
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1314: in _wait_no_pending_flushes
    if not wait_for(no_pending_flushes, timeout=wait_timeout, step=1.0):
../scylla/.local/lib/python3.11/site-packages/ccmlib/common.py:675: in wait_for
    if func():
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1306: in no_pending_flushes
    stdout = self.nodetool('cfstats', timeout=wait_timeout, verbose=verbose)[0]
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:754: in nodetool
    return super().nodetool(*args, **kwargs)
../scylla/.local/lib/python3.11/site-packages/ccmlib/node.py:794: in nodetool
    stdout, stderr = p.communicate(timeout=timeout)
/usr/lib64/python3.11/subprocess.py:1209: in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
/usr/lib64/python3.11/subprocess.py:2109: in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Popen: returncode: None args: ['/jenkins/workspace/scylla-master/dtest-debu...>
endtime = 2588.165606614, orig_timeout = 60, stdout_seq = [], stderr_seq = []
skip_check_and_raise = False

    def _check_timeout(self, endtime, orig_timeout, stdout_seq, stderr_seq,
                       skip_check_and_raise=False):
        """Convenience for checking if a timeout has expired."""
        if endtime is None:
            return
        if skip_check_and_raise or _time() > endtime:
>           raise TimeoutExpired(
                    self.args, orig_timeout,
                    output=b''.join(stdout_seq) if stdout_seq else None,
                    stderr=b''.join(stderr_seq) if stderr_seq else None)
E           subprocess.TimeoutExpired: Command '['/jenkins/workspace/scylla-master/dtest-debug/scylla/.ccm/scylla-repository/93be4c0cb0f0c53fe0eb7cd4d06ba15a7fc01d29/share/cassandra/bin/nodetool', '-h', '127.0.12.2', '-p', '7199', '-Dcom.sun.jndi.rmiURLParsing=legacy', 'cfstats']' timed out after 60 seconds

The flush api is synchronous and so it waits for
the respective flush to get done and will return
and error if flush failed for any reason and
this will propagate through nodetool flush.

_wait_no_pending_flushes may fail if there
are spontanuous flushes, and it also tends to
timeout in nodetool cfstats in debug mode,
see for example:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/250/testReport/alternator_ttl_tests/TestAlternatorTTL/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split029___test_ttl_with_load_and_decommission/
```
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1320: in flush
    self._wait_no_pending_flushes(verbose=kwargs.get('verbose', True))
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1314: in _wait_no_pending_flushes
    if not wait_for(no_pending_flushes, timeout=wait_timeout, step=1.0):
../scylla/.local/lib/python3.11/site-packages/ccmlib/common.py:675: in wait_for
    if func():
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1306: in no_pending_flushes
    stdout = self.nodetool('cfstats', timeout=wait_timeout, verbose=verbose)[0]
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:754: in nodetool
    return super().nodetool(*args, **kwargs)
../scylla/.local/lib/python3.11/site-packages/ccmlib/node.py:794: in nodetool
    stdout, stderr = p.communicate(timeout=timeout)
/usr/lib64/python3.11/subprocess.py:1209: in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
/usr/lib64/python3.11/subprocess.py:2109: in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Popen: returncode: None args: ['/jenkins/workspace/scylla-master/dtest-debu...>
endtime = 2588.165606614, orig_timeout = 60, stdout_seq = [], stderr_seq = []
skip_check_and_raise = False

    def _check_timeout(self, endtime, orig_timeout, stdout_seq, stderr_seq,
                       skip_check_and_raise=False):
        """Convenience for checking if a timeout has expired."""
        if endtime is None:
            return
        if skip_check_and_raise or _time() > endtime:
>           raise TimeoutExpired(
                    self.args, orig_timeout,
                    output=b''.join(stdout_seq) if stdout_seq else None,
                    stderr=b''.join(stderr_seq) if stderr_seq else None)
E           subprocess.TimeoutExpired: Command '['/jenkins/workspace/scylla-master/dtest-debug/scylla/.ccm/scylla-repository/93be4c0cb0f0c53fe0eb7cd4d06ba15a7fc01d29/share/cassandra/bin/nodetool', '-h', '127.0.12.2', '-p', '7199', '-Dcom.sun.jndi.rmiURLParsing=legacy', 'cfstats']' timed out after 60 seconds
```

Signed-off-by: Benny Halevy <[email protected]>
@bhalevy bhalevy requested review from nyh and fruch August 27, 2023 12:59
Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message has a long introduction to the problem but forgets to say what you actually changed :-) But I understood it reading the trivial code.

It's worth noting that that this function wait_no_pending_flushes that you are now deleting was added seven years ago in commit b03bbb1 because, as the commit message said, "nodetool flush' is asynchronous in scylla". Supposedly this changed over the years - but I can't find the right Scylla commit where this changed. The current code indeed appears to be waiting for the flush to happen: co_await replica::database::flush_tables_on_all_shards(...).

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit b78e145 into scylladb:next Aug 27, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants