Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to declare upload length #96

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

Deimvis
Copy link

@Deimvis Deimvis commented Aug 3, 2024

Currently, there is no way to declare length to tus server unless it has been declared on upload start (POST request).
So when upload was started somewhere else with "Upload-Defer-Length": "1" header it becomes impossible to continue upload using python client since it will never send "Upload-Length" header to the server.

I attempt to solve it by adding declare_length method to Uploader class that will first HEAD upload_url and if "Upload-Defer-Length": "1" is returned then send empty PATCH request with "Upload-Length" header. Orecautions are needed since the second and subsequent length declarations raise an error.
I also tried to make it efficient as possible and added length_declared keyword argument to BaseUploader that is True by default and can be set to False in order to hint that length should be declared as soon as possible and it will be declared on next PATCH request. I agree, that the code is a bit of a mess, but I haven't found a better way to implement this crucial feature so far.

I also investigated other approaches. For example, I found out that "Upload-Complete" IETF header can be used in order to hint server to finish upload even though length wasn't declared, but I decided that it's a bad idea to mess tus client with IETF headers.

I think that in the future it would be great to support uploading byte stream without seeking to the end to determine its size and declare its size only when last read operation happened and eof occured.

Looking forward to your to your opinion on this issue and its solution @Acconut

@Deimvis Deimvis changed the title Add support to declare length Add support to declare upload length Aug 3, 2024
@Acconut
Copy link
Member

Acconut commented Aug 23, 2024

Apologies for not responding sooner. I hope to look into this in two weeks. I had a brief first look at this and I would prefer a more complete solution. For example, tus-js-client does not have a dedicated declare_length method, but instead allows the user to pass a readable stream. tus-js-client then takes care of setting Upload-Defer-Length: 1 when creating the upload and then later setting Upload-Length once the stream closes and the entire length is known. The end user does not have to get involved with these lower level mechanics. Plus, the risk is removed that the user accidentally declares the wrong length (i.e. more or less than is actually uploaded). What do you think?

@Acconut
Copy link
Member

Acconut commented Aug 23, 2024

I also investigated other approaches. For example, I found out that "Upload-Complete" IETF header can be used in order to hint server to finish upload even though length wasn't declared, but I decided that it's a bad idea to mess tus client with IETF headers.

This IETF work will actually be the basis for a future tus 2.0 (if it's actually needed). But in tus 1.0 we don't have the Upload-Complete header available :)

@Deimvis
Copy link
Author

Deimvis commented Aug 25, 2024

Apologies for not responding sooner. I hope to look into this in two weeks. I had a brief first look at this and I would prefer a more complete solution. For example, tus-js-client does not have a dedicated declare_length method, but instead allows the user to pass a readable stream. tus-js-client then takes care of setting Upload-Defer-Length: 1 when creating the upload and then later setting Upload-Length once the stream closes and the entire length is known. The end user does not have to get involved with these lower level mechanics. Plus, the risk is removed that the user accidentally declares the wrong length (i.e. more or less than is actually uploaded). What do you think?

Sounds nice! I took a look into js client implementation. It's really what I think is the best approach and what I mentioned in the end of my proposal about not seeking to the end of the reader and calculating upload size only when we are done reading from the given stream. I just hesitated to implement this since it requires a little more changes to the source code, but I will happy to do this. Will be back soon with more js client like implementation. Thanks!

@Deimvis
Copy link
Author

Deimvis commented Aug 28, 2024

I imlemented upload_length_deferred algorithm similar to the one used in js client. Though, I decided to keep declare_length method in BaseUploader. The rationale is that user may want to declare the length as soon as possible in order to hint the server beforehand about the total upload size and to enable the server to work more efficiently. For example, in my S3 storage implementation for tusd server I have configured the best size for parts in multipart upload and unless I know the total upload size I use heuristics to calculate the part size and it always works less efficiently than in case the server knows about the total upload size in advance.
Using declare_length method will force upload size calculation and will set upload_length_deferred to False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants