Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSFS | S3 | Versioning: Internal Error on DeleteObject in version-enabled mode, test_versioning_obj_suspended_copy #8469

Open
hseipp opened this issue Oct 16, 2024 · 4 comments · May be fixed by #8526
Assignees
Labels

Comments

@hseipp
Copy link

hseipp commented Oct 16, 2024

Environment info

Actual behavior

The Ceph s3-tests method test_versioning_obj_suspended_copy() fails with

        # cleaning up
>       client.delete_object(Bucket=bucket_name, Key=key2)

s3tests_boto3/functional/test_s3.py:8600: 
...
self = <botocore.client.S3 object at 0x7fb198e9b430>, operation_name = 'DeleteObject', api_params = {'Bucket': 's3tests-qjr2rzjcxt7mk6qa5vi8d-1', 'Key': 'testobj2'}
...
E           botocore.exceptions.ClientError: An error occurred (InternalError) when calling the DeleteObject operation (reached max retries: 4): We encountered an internal error. Please try again.

.tox/py/lib/python3.8/site-packages/botocore/client.py:1023: ClientError

when issuing a DeleteObject operation on an Object that got created using CopyObject in version-suspended mode.

Expected behavior

Test test should pass.

Steps to reproduce

Execute Ceph s3-tests test_versioning_obj_suspended_copy():

def test_versioning_obj_suspended_copy():
    bucket_name = get_new_bucket()
    client = get_client()

    check_configure_versioning_retry(bucket_name, "Enabled", "Enabled")

    key1 = 'testobj1'
    num_versions = 1
    (version_ids, contents) = create_multiple_versions(client, bucket_name, key1, num_versions)

    check_configure_versioning_retry(bucket_name, "Suspended", "Suspended")

    content = 'null content'
    overwrite_suspended_versioning_obj(client, bucket_name, key1, version_ids, contents, content)

    # copy to another object
    key2 = 'testobj2'
    copy_source = {'Bucket': bucket_name, 'Key': key1}
    client.copy_object(Bucket=bucket_name, Key=key2, CopySource=copy_source)

    # delete the source object. keep the 'null' entry in version_ids
    client.delete_object(Bucket=bucket_name, Key=key1)

    # get the target object
    response = client.get_object(Bucket=bucket_name, Key=key2)
    body = _get_body(response)
    assert body == content

    # cleaning up
    client.delete_object(Bucket=bucket_name, Key=key2)
    client.delete_object(Bucket=bucket_name, Key=key2, VersionId='null')

    clean_up_bucket(client, bucket_name, key1, version_ids)

More information - Screenshots / Logs / Other output

Symptom-wise it looks similar to #8382 but that test case passes while test_versioning_obj_suspended_copy() fails at every attempt.

Noobaa log with "all" log level:
noobaa-20241016_2.log.gz

@nadavMiz
Copy link
Contributor

@hseipp test passes on my environment after adding sleep after changing to suspended mode. can you test it on your end, maybe its #8391 again?

@hseipp
Copy link
Author

hseipp commented Nov 11, 2024

I can confirm that with a 65-second delay added after the switch to suspended mode the test passes:

     (version_ids, contents) = create_multiple_versions(client, bucket_name, key1, num_versions)
 
     check_configure_versioning_retry(bucket_name, "Suspended", "Suspended")
+    time.sleep(65)
 
     content = 'null content'
     overwrite_suspended_versioning_obj(client, bucket_name, key1, version_ids, contents, content)

@nadavMiz
Copy link
Contributor

nadavMiz commented Nov 12, 2024

@hseipp thanks, I looked at it a bit more. and it seems that even though we don't have an issue in Suspended mode. we do have an issue with enabled mode. changing the test to remove all suspended mode code:

    check_configure_versioning_retry(bucket_name, "Enabled", "Enabled")

    key1 = 'testobj1'
    num_versions = 1
    (version_ids, contents) = create_multiple_versions(client, bucket_name, key1, num_versions)

    # copy to another object
    key2 = 'testobj2'
    copy_source = {'Bucket': bucket_name, 'Key': key1}
    client.copy_object(Bucket=bucket_name, Key=key2, CopySource=copy_source)

    # delete the source object. keep the 'null' entry in version_ids
    client.delete_object(Bucket=bucket_name, Key=key1)

    # get the target object
    response = client.get_object(Bucket=bucket_name, Key=key2)
    body = _get_body(response)
    assert body == content

    # cleaning up
    client.delete_object(Bucket=bucket_name, Key=key2)

in this test we get the same error you mentioned

@nadavMiz
Copy link
Contributor

in the logs I see the following error:

Nov 12 00:30:25 tmtscalets-protocol-1 node[3027204]: Nov-12 0:30:25.844 [nsfs/3027204] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/ceph-1smerz7ily00msthmshff7q8-1/testobj2</Resource><RequestId>m3e6zod7-di6s3-3xe</RequestId></Error> DELETE /ceph-1smerz7ily00msthmshff7q8-1/testobj2 {"host":"10.11.87.62:6443","accept-encoding":"identity","user-agent":"Boto3/1.35.22 md/Botocore#1.35.22 ua/2.0 os/linux#5.14.0-427.35.1.el9_4.x86_64 md/arch#x86_64 lang/python#3.9.18 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.35.22","x-amz-date":"20241112T083025Z","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","authorization":"AWS4-HMAC-SHA256 Credential=WY1TcACn4uZKWJorh0qN/20241112/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=4949eaf7031276d0ee9854c025a9af3150f50040cc1d68e80d69b887c48f02ce","amz-sdk-invocation-id":"42df82e2-4e31-4b50-9483-cc3b57c5ea92","amz-sdk-request":"ttl=20241112T083121Z; attempt=5; max=5","content-length":"0"} Error: FS::SafeLink ERROR link target doesn't match expected inode and mtime - context: SafeLink _link_from.c_str()=/ibm/fs1/teams/ceph-1smerz7ily00msthmshff7q8-1/testobj2 _link_to.c_str()=/ibm/fs1/teams/ceph-1smerz7ily00msthmshff7q8-1/.versions/testobj2_mtime-d5k25jd5bocg-ino-6soc _link_expected_mtime=1731400217477658880 _link_expected_inode=317100

this indicates the the objects modification time didn't match the objects version. this might indicate that the file was modified after it was created, or at least that linux thinks it was modified

@nadavMiz nadavMiz changed the title NSFS | S3 | Versioning: Internal Error on DeleteObject in version-suspended mode, test_versioning_obj_suspended_copy NSFS | S3 | Versioning: Internal Error on DeleteObject in version-enabled mode, test_versioning_obj_suspended_copy Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants