Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: exception with build while rclone sync is running #11544

Open
sentry-io bot opened this issue Aug 15, 2024 · 8 comments
Open

Build: exception with build while rclone sync is running #11544

sentry-io bot opened this issue Aug 15, 2024 · 8 comments
Labels
Accepted Accepted issue on our roadmap Bug A bug

Comments

@sentry-io
Copy link

sentry-io bot commented Aug 15, 2024

A user noticed a build failure which seems to be the repository contents causing some error with rclone:

Sentry Issue: READTHEDOCS-ORG-RYA

CalledProcessError: Command '['rclone', 'sync', '--transfers=8', '--checksum', '--verbose', '--retries=3', '--retries-sleep=1s', '--', '/home/docs/checkouts/readthedocs.org/user_builds/bt-tools/checkouts/latest/_readthedocs/html', ':s3:readthedocs-media-prod/html/bt-tools/latest']' returned non-zero exit status 1.
  File "readthedocs/projects/tasks/builds.py", line 946, in store_build_artifacts
    build_media_storage.rclone_sync_directory(from_path, to_path)
  File "readthedocs/builds/storage.py", line 141, in rclone_sync_directory
    return self._rclone.sync(source, destination)
  File "readthedocs/storage/rclone.py", line 124, in sync
    return self.execute("sync", args=[source, self.get_target(destination)])
  File "readthedocs/storage/rclone.py", line 101, in execute
    result = subprocess.run(

Error copying to storage

I tried this locally, but didn't see much more in the ways of stderr or rclone error output, so we will need to dig a little deeper into the command execution.

Front logo Front conversations

@sentry-io sentry-io bot added the Bug A bug label Aug 15, 2024
@agjohnson
Copy link
Contributor

Here is a repository that reliable triggers this exception:

https://github.com/boschresearch/bt_tools

@agjohnson agjohnson added the Accepted Accepted issue on our roadmap label Aug 15, 2024
@agjohnson
Copy link
Contributor

I'm going to put this on next sprint so we don't forget about it. It's not a common bug, however the failure case isn't great as it is a silent failure to the user.

@humitos
Copy link
Member

humitos commented Aug 29, 2024

This is not a silent failure to the user. We are failing the builds if that happens:

try:
build_media_storage.rclone_sync_directory(from_path, to_path)
except Exception as exc:
# NOTE: the exceptions reported so far are:
# - botocore.exceptions:HTTPClientError
# - botocore.exceptions:ClientError
# - readthedocs.doc_builder.exceptions:BuildCancelled
log.exception(
"Error copying to storage",
media_type=media_type,
from_path=from_path,
to_path=to_path,
)
# Re-raise the exception to fail the build and handle it
# automatically at `on_failure`.
# It will clearly communicate the error to the user.
raise BuildAppError("Error uploading files to the storage.") from exc

They should see "Error uploading files to the storage." error in the build details page.

@agjohnson
Copy link
Contributor

agjohnson commented Aug 29, 2024

Seems like that is just our logging though, not reporting/notification to the user.

From the attached conversation, this is the build the user reported:

https://app.readthedocs.org/projects/bt-tools/builds/25303128/

There's no failure mentioned there besides the generic failure:

image

At least there is a notification, so not silent technically, but there is no indication of what actually failed there.

@humitos
Copy link
Member

humitos commented Aug 29, 2024

Yeah, BuildApp exception always shows the generic one by design. We don't want to expose the user with cryptic rclone related or similar issues. They are internal issues.

@agjohnson
Copy link
Contributor

We should mention something to the user though, not just a generic failure. It indeed does not need to mention rclone, but does need to specifically call out what the user is noticing. I would not mention "Error uploading ..." as uploading is a background technical implementation, without the user knowing what is happening in this step.

Instead, what we should communicate to the user is that their documentation was not updated, or might have only been partially updated, and to resolve the issue they can try rebuilding again.

In this particular case, retrying will not fix the problem though.

@humitos
Copy link
Member

humitos commented Aug 29, 2024

we should communicate to the user is that their documentation was not updated, or might have only been partially updated, and to resolve the issue they can try rebuilding again

Good point 👍

@humitos
Copy link
Member

humitos commented Sep 2, 2024

We will need to create a specific exception with message_id and notification for this. Then, handle it at

# Known errors in our application code (e.g. we couldn't connect to
# Docker API). Report a generic message to the user.
if isinstance(exc, BuildAppError):
message_id = BuildAppError.GENERIC_WITH_BUILD_ID
to check if the exception has a message_id or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug
Projects
Status: Planned
Development

No branches or pull requests

2 participants