-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init s3 storage #29
init s3 storage #29
Conversation
Omarabdul3ziz
commented
Sep 25, 2023
•
edited
Loading
edited
- Implement S3 blob store #26
Also a lot of clean up and refactor
Next: performance improvments
also fix a file size bug
add cmdline to do unpacking
Also update docs
0039dd5
to
5bb0268
Compare
src/store/s3.rs
Outdated
use tokio::io::AsyncReadExt; | ||
|
||
fn get_config() -> Result<(String, String, Credentials)> { | ||
// TODO: get these from .env? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why env? the idea is that all stores can be defined in a url string. this can simply be
s3://[key:secret@]hostname[/bucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So start by parsing the url, and extract the pieces that u need.
src/store/s3.rs
Outdated
|
||
let client = S3Client::new_with(dispatcher, provider, region); | ||
|
||
let create_bucket_request = CreateBucketRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create the bucket? i really think that should be already done by the user before as part of the operation. For example, a zdb store won't create the namespace but will use it. Same applies here
src/store/s3.rs
Outdated
let put_object_request = PutObjectRequest { | ||
bucket: self.bucket.clone(), | ||
key: hex::encode(key), | ||
body: Some(ByteStream::from(blob.to_owned())), | ||
..Default::default() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did u decide to use rusoto_s3? this one feels inefficient that u need to copy everything to make a PutObjectRequest (i see u copy the bucket, and also the blob itself which is a lot of data to be copied with every set call)
Please find another library that is better. checkout the s3 sdk from amazon or anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one looks easy enough https://docs.rs/rust-s3/0.33.0/s3/ and doesn't require u to copy everything all the time. U can create a single bucket object on ::new() and keep it on the store, then use that to do all the set/get calls directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/store/mod.rs
Outdated
@@ -52,6 +55,13 @@ pub enum Error { | |||
#[error("encryption error")] | |||
EncryptionError, | |||
|
|||
#[error("bucket creation error")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like that the generic Error type for store has "implementation specific" errors. Those errors need to be not that specific for s3.
For example, there is Url error, there is also Other
error which can hold an anyhow::Error so basically any error that does not fit the other error types.
src/store/s3store.rs
Outdated
fn get_config(url: &str) -> Result<(Credentials, Region, String)> { | ||
let url = url::Url::parse(url.as_ref())?; | ||
|
||
let (access_key, access_secret, endpoint, bucket_name, region_name) = match url.host() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unreadable, instead this can go like this.
use anyhow::{Context};
// then in function do
let host = url.host().context("no host in url")?;
// continue working with the host value
instead of this
src/store/s3store.rs
Outdated
let access_secret = url | ||
.password() | ||
.ok_or(Error::InvalidConfigs(String::from( | ||
"did not find secret key", | ||
)))? | ||
.to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is really the secret is required? I can see already u rewrap it in Some
means it's not required by the s3 api, no ? This can simplified to
let access_secret = url.password().context("passowrd is required")?;
BUT since access_secret is already optional as seen below at line 56
this may be can be something like
let access_secret = url.password().map(|s| s.to_owned());
then access_secret will become Some(String)
src/store/s3store.rs
Outdated
// rust-s3 implementation force tls unless it found `://` in the endpoint | ||
// check rust-s3/aws-region/src/region.rs #fn scheme | ||
let scheme = match url.query_pairs().find(|(key, _)| key == "tls") { | ||
Some((_, value)) if value == "false" => "http://", | ||
_ => "", | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is actually going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the api requires always the url has https
schema why don't rewrite the url immediately to https
?. Why would i pass the tls as a query instead of doing something like
s3+tls://
as a schema or s3://
if we really need to support both https
and http
src/store/s3store.rs
Outdated
} | ||
|
||
impl S3Store { | ||
pub async fn new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not need to be async since it doesn't do any await in code
src/store/s3store.rs
Outdated
#[async_trait::async_trait] | ||
impl Store for S3Store { | ||
async fn get(&self, key: &[u8]) -> super::Result<Vec<u8>> { | ||
let response = self.bucket.get_object(std::str::from_utf8(key)?).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key is not granted to be convertable to utf8 if the get_object REQUIRES the key to be a string, I assume it's better to do a hex
of the provided binary. doing utf8(key) will most definitely fail when running
- use the generic Error::Other error - better support for both http and https - more readable error handling - hex encode the key in set/get
- add openssl crate - handle the `=` sign for params in the first parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good and works fine, can u just (beside the comment i left on code) update the readme.md to explain how to use the s3 store.
src/store/s3store.rs
Outdated
.query_pairs() | ||
.find(|(key, _)| key == "region") | ||
.map(|(_, value)| value.to_string()) | ||
.context("region name not found")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we instead have a default value? or even empty if not set?