Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folder structure, problem #68

Open
C1ph3R-s opened this issue Sep 25, 2023 · 19 comments
Open

Folder structure, problem #68

C1ph3R-s opened this issue Sep 25, 2023 · 19 comments
Assignees

Comments

@C1ph3R-s
Copy link

Hi

I would like to have a little guidance: so I have to upload a folder structure to s3:

my local structure is:

test1/
└── test2
    └── test3
        └── test.txt

on backstage gui structure

├─ test1
│  ├─ test2
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3
│  │  ├─ test3

So my question is, what could be the reson of that or how can I solve this?
I do not really understand this behaviour. ( I used minio s3 client)

Thanks

@ivangonzalezacuna
Copy link
Collaborator

That sounds weird to me. I've just tried the same setup in our demo (available inside the folder /demo in this repo). Replacing the example with a test1/test2/test3/test.txt like you have, and the data was displayed properly (the demo uses ceph):
Screenshot from 2023-09-25 14-31-23

So I would suspect the problem is how minio returns the data. We are using the aws-sdk packages internally, and there might be the case that those 2 clients are not fully understanding each other, but I don't really know, since we haven't tested this setup with minio.

What I would suggest you to try is implementing a custom S3Api using a Minio client. To do it, you would need to implement a class extending the S3Api that can be found here. Most likely, the code for this client would be almost the same as in the S3Client. Then, you have to overwrite the used client before you call the method S3Builder.build() with the function getClient.

If this fixes the issue. I would suggest to extend the list of clients to be able to use minio as well.

@ivangonzalezacuna ivangonzalezacuna self-assigned this Sep 25, 2023
@C1ph3R-s
Copy link
Author

@ivangonzalezacuna Thank for your reply, I just used minio client for the upload to S3, which type of client did you used for that? I would like to try that one first. Maybe it just a minio client bug.

@ivangonzalezacuna
Copy link
Collaborator

Hi @C1ph3R-s , to try it out locally I used a ceph image. You can find more info in this block of the plugin documentation: https://github.com/spreadshirt/backstage-plugin-s3#development

Basically, the contents of the folder ./demo/examples will be synced as soon as you start the container (it takes some time, just wait a bit until the sync is done, which is the last step). Add or remove things as you like, just for testing.

Then, if you start the test environment using yarn start you should see what I shared in the screenshot. If your app-config.yaml looks like the one in here and using ceph works, it might be an issue with that minio client then.

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna I have tested a the setup again, so minio works well becase when i use mc ls command i got this:

user@host:# mc ls s3/test_bucket/test1
0B test2/
user@host:# mc ls s3/test_bucket/test1/test2
0B test3/
user@host:# mc ls s3/test_bucket/test1/test2/test3
5B STANDARD test.txt

but when I check the backstage GUI it got this below:

test1 folder
backstage_test1

test2 folder
backstage_test2

test3 folder
backstage_test3

test.txt file in every test3 folder
backstage_test_txt

Do you have any idea?

Thank you

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna

I have tested with aws cli tool too, but unfortunatelly I got the same result on backstage GUI.

Here is my script for the poc:

#!/bin/bash
export AWS_ACCESS_KEY_ID="your access key"\
export AWS_SECRET_ACCESS_KEY="your secret access key"

BUCKET_NAME="test_bucket"
DIRECTORY="test1"

alias aws='aws --endpoint-url https://s3-redhat-openshift-storage.com'

for FILE in $DIRECTORY/*
do
  aws s3 cp $FILE s3://$BUCKET_NAME/ --recursive
done

Thanks you

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna It seems to me that when I upload a file to the s3 bucket, it works fine. However, when I attempt to add a file to a folder or a folder structure, the issue I described earlier occurs.

@ivangonzalezacuna
Copy link
Collaborator

That still looks super weird to me. Specially because replicating the same directory structure you suggest works in my case.

Also, the method that seems to be failing is the listBucketKets from the default S3Client: the code is here

The internal client should return an object containing several arrays. The result inside the CommonPrefixes are the folders, and the values in Contents are the files. Those 2 results are appended into one slice are returned to the frontend, so they are printed.

The file listing looks fine to me, since you get no duplications at all. So I would suggest checking what the output of CommonPrefixes is. What we could do in this case is extend that check, and make sure all the keys are unique, so only one testX/ will be present.

But for understanding what is going on in the background, could you try out to get the full result from the s3Client.listObjects() method? You could use your backstage instance for it: When you initialize the s3 plugin with S3Builder.createBuilder you can add a setS3Client() call and use your custom client for the tests. This client must be a copy of the default S3Client, but adding some logs to the method mentioned above.

Also, could you do a yarn why @aws-sdk/client-s3? Just to see if the version of the package you're using is messing up with the results.

@C1ph3R-s
Copy link
Author

Hi @ivangonzalezacuna,

I have find the problem with this, do you have any offical way to report it for you?

Thank you,

@ivangonzalezacuna
Copy link
Collaborator

Hi @ivangonzalezacuna,

I have find the problem with this, do you have any offical way to report it for you?

Thank you,

Awesome!

You can simply follow the usual ways to contribute by creating a fork and then a MR to this repo. You can explain in there what was the issue and what fixed it. You can assign it to me, and I will review it as soon as I can.

Or is there something else you wanted to share in private?

@C1ph3R-s
Copy link
Author

Hi @ivangonzalezacuna,
I have find the problem with this, do you have any offical way to report it for you?
Thank you,

Awesome!

You can simply follow the usual ways to contribute by creating a fork and then a MR to this repo. You can explain in there what was the issue and what fixed it. You can assign it to me, and I will review it as soon as I can.

Or is there something else you wanted to share in private?

Unfortunately I cannot handle the fix, but I know what is the bug, private would be better :)

@ivangonzalezacuna
Copy link
Collaborator

Unfortunately I cannot handle the fix, but I know what is the bug, private would be better :)

In such case, you could also send me a message via Discord for example: ivan.gonzalez

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna
Sorry for the late response, so the plugin works perfectly on OpenShift 4.9.10, but the bug manifests itself in a higher version then 4.12.

@ivangonzalezacuna
Copy link
Collaborator

Oh, that's such an interesting behavior. Is there any change in any version in between that might have caused the issue?

Anyways, we will proceed with an upgrade of the plugin to the latest backstage release. I'll check if the aws packages can be updated to a newer version. As soon as the MR is ready I can ping you, and you could try out that branch and see if this fixed the issue or not.

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna
That is a good question, unfortunatelly I do not know :/
Thank you very much for your help!

@ivangonzalezacuna
Copy link
Collaborator

Could you try the code from the MR mentioned above? (#72).

Let's see if this update of the aws-sdk package is fixing the issue you had in newer OpenShift versions

@C1ph3R-s
Copy link
Author

@ivangonzalezacuna how can I test it? I just have to update the plugin version?

@ivangonzalezacuna
Copy link
Collaborator

ivangonzalezacuna commented Oct 19, 2023

You can do it by modifying the used version for these packages. As an example, this is how it should look like:

    "@spreadshirt/backstage-plugin-s3-viewer": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",

What it's after the # is the commitId, but you can also use the branch name for example.

EDIT: The new release has been published. You can also use the latest tags now

@C1ph3R-s
Copy link
Author

C1ph3R-s commented Nov 7, 2023

You can do it by modifying the used version for these packages. As an example, this is how it should look like:

    "@spreadshirt/backstage-plugin-s3-viewer": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",
    "@spreadshirt/backstage-plugin-s3-viewer-common": "https://github.com/spreadshirt/backstage-plugin-s3.git#11ae77825dfab21b0e51dc25acc69d1a4b0fa4e6",

What it's after the # is the commitId, but you can also use the branch name for example.

EDIT: The new release has been published. You can also use the latest tags now

@ivangonzalezacuna I have tested with the new version but unfortunately i got the same result :/ Could u test it pls?

@ivangonzalezacuna
Copy link
Collaborator

Hm, such a pity. Then, the only thing I can think about is that the response when listing keys is broken for the newest versions of OpenShift. We could try to add a fix like I explained here: #68 (comment), by removing duplicated entries (since it's a directory style, I wouldn't expect more than 1 file with the same name, but the API for those versions return that) and see if that fixes the issue. Anyways, might be something that broke in the recent versions and something has been fixed in the meantime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants