Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init s3 storage #29

Merged
merged 44 commits into from
Oct 26, 2023
Merged

init s3 storage #29

merged 44 commits into from
Oct 26, 2023

Conversation

Omarabdul3ziz
Copy link
Contributor

@Omarabdul3ziz Omarabdul3ziz commented Sep 25, 2023

@Omarabdul3ziz Omarabdul3ziz marked this pull request as ready for review September 27, 2023 12:24
@Omarabdul3ziz Omarabdul3ziz changed the base branch from master to v2-meta-rewrite September 27, 2023 13:38
@Omarabdul3ziz Omarabdul3ziz marked this pull request as draft September 27, 2023 15:17
src/store/s3.rs Outdated
use tokio::io::AsyncReadExt;

fn get_config() -> Result<(String, String, Credentials)> {
// TODO: get these from .env?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why env? the idea is that all stores can be defined in a url string. this can simply be

s3://[key:secret@]hostname[/bucket

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So start by parsing the url, and extract the pieces that u need.

src/store/s3.rs Outdated

let client = S3Client::new_with(dispatcher, provider, region);

let create_bucket_request = CreateBucketRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create the bucket? i really think that should be already done by the user before as part of the operation. For example, a zdb store won't create the namespace but will use it. Same applies here

src/store/s3.rs Outdated
Comment on lines 112 to 117
let put_object_request = PutObjectRequest {
bucket: self.bucket.clone(),
key: hex::encode(key),
body: Some(ByteStream::from(blob.to_owned())),
..Default::default()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did u decide to use rusoto_s3? this one feels inefficient that u need to copy everything to make a PutObjectRequest (i see u copy the bucket, and also the blob itself which is a lot of data to be copied with every set call)

Please find another library that is better. checkout the s3 sdk from amazon or anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one looks easy enough https://docs.rs/rust-s3/0.33.0/s3/ and doesn't require u to copy everything all the time. U can create a single bucket object on ::new() and keep it on the store, then use that to do all the set/get calls directly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base automatically changed from v2-meta-rewrite to master October 2, 2023 12:14
src/store/mod.rs Outdated
@@ -52,6 +55,13 @@ pub enum Error {
#[error("encryption error")]
EncryptionError,

#[error("bucket creation error")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like that the generic Error type for store has "implementation specific" errors. Those errors need to be not that specific for s3.

For example, there is Url error, there is also Other error which can hold an anyhow::Error so basically any error that does not fit the other error types.

fn get_config(url: &str) -> Result<(Credentials, Region, String)> {
let url = url::Url::parse(url.as_ref())?;

let (access_key, access_secret, endpoint, bucket_name, region_name) = match url.host() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unreadable, instead this can go like this.

use anyhow::{Context};

// then in function do
let host = url.host().context("no host in url")?;

// continue working with the host value

instead of this

Comment on lines 14 to 19
let access_secret = url
.password()
.ok_or(Error::InvalidConfigs(String::from(
"did not find secret key",
)))?
.to_string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is really the secret is required? I can see already u rewrap it in Some means it's not required by the s3 api, no ? This can simplified to

let access_secret = url.password().context("passowrd is required")?;

BUT since access_secret is already optional as seen below at line 56 this may be can be something like

let access_secret = url.password().map(|s| s.to_owned()); 

then access_secret will become Some(String)

Comment on lines 28 to 34
// rust-s3 implementation force tls unless it found `://` in the endpoint
// check rust-s3/aws-region/src/region.rs #fn scheme
let scheme = match url.query_pairs().find(|(key, _)| key == "tls") {
Some((_, value)) if value == "false" => "http://",
_ => "",
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is actually going on here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the api requires always the url has https schema why don't rewrite the url immediately to https ?. Why would i pass the tls as a query instead of doing something like
s3+tls:// as a schema or s3:// if we really need to support both https and http

}

impl S3Store {
pub async fn new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not need to be async since it doesn't do any await in code

#[async_trait::async_trait]
impl Store for S3Store {
async fn get(&self, key: &[u8]) -> super::Result<Vec<u8>> {
let response = self.bucket.get_object(std::str::from_utf8(key)?).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key is not granted to be convertable to utf8 if the get_object REQUIRES the key to be a string, I assume it's better to do a hex of the provided binary. doing utf8(key) will most definitely fail when running

- use the generic Error::Other error
- better support for both http and https
- more readable error handling
- hex encode the key in set/get
- add openssl crate
- handle the `=` sign for params in the first parser
@Omarabdul3ziz Omarabdul3ziz marked this pull request as ready for review October 16, 2023 07:11
Copy link
Member

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good and works fine, can u just (beside the comment i left on code) update the readme.md to explain how to use the s3 store.

.query_pairs()
.find(|(key, _)| key == "region")
.map(|(_, value)| value.to_string())
.context("region name not found")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we instead have a default value? or even empty if not set?

@muhamadazmy muhamadazmy merged commit 51faf1f into master Oct 26, 2023
1 check passed
@muhamadazmy muhamadazmy deleted the master_s3 branch October 26, 2023 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants