Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload: Auto adjust maxChunkSize until upload succeeds after HTTP-413 #4826

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jkellerer
Copy link

@jkellerer jkellerer commented Aug 6, 2022

Currently when Nextcloud runs behind a server with an upload limit, uploading a large file (e.g. several 100M) will likely fail when nextcloud.cfg is not manually adjusted on every endpoint to limit maxChunkSize (assuming an upload limit is usually not set to 1G but more likely 100M or lower).

This PR implements auto-adjustment of the max upload size when receiving HTTP-413 (request entity too large) or a close connection error after large uploads

This is achieved with the following strategy:

  • The config is enhanced with parameter failsafeMaxChunkSize (default 100M).
  • When HTTP-413 is received:
    • If requestSize > failsafeMaxChunkSize, the failsafe max size applies and the sync is retried immediately.
    • Otherwise maxChunkSize = requestSize / 2 and the sync is retried until it succeeds or minChunkSize is reached (default 1M).
  • A connection close error is treated like HTTP-413 if requestSize > failsafeMaxChunkSize. HTTP usually requires that requests are fully consumed, servers may decide to close the HTTP connection when too much data is sent to prevent this. As connection close is not a specific error, only large uploads are handled by this change.

While failsafeMaxChunkSize can be adjusted in nextcloud.cfg or via env var OWNCLOUD_FAILSAFE_MAX_CHUNK_SIZE, it should have a reasonable default so that it isn't required. When set properly it limits the amount of retries needed and enables uploads for the cases where the HTTP connection is closed as response to sending too much data.

For cases where an upload limit in the web server is greater 1M and a connection close error is only received for uploads >100M, the default values allow all uploads to succeed without any additional configuration.

This should fix most cases reporting errors in this area like #4278 and #925.

For 2 concurrent uploads (~0.5GB each) and the nginx frontend limited to 100M this looks like this (in an earlier implementation without failsafeMaxChunkSize):

2022-08-06 22:18:06:368 [ warning nextcloud.sync.credentials.webflow ./src/gui/creds/webflowcredentials.cpp:229 ]:	"Error transferring https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000001 - server replied: Request Entity Too Large"
2022-08-06 22:18:06:369 [ info nextcloud.sync.networkjob.put ./src/libsync/propagateupload.cpp:87 ]:	PUT of "https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000001" FINISHED WITH STATUS "UnknownContentError Server replied \"413 Request Entity Too Large\" to \"PUT https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000001\"" QVariant(int, 413) QVariant(QString, "Request Entity Too Large")
2022-08-06 22:18:06:369 [ debug nextcloud.sync.propagator.upload ./src/libsync/propagateupload.cpp:662 ]	[ OCC::PropagateUploadFileCommon::commonErrorHandling ]:	"<html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx/1.22.0</center>\r\n</body>\r\n</html>\r\n"
2022-08-06 22:18:06:369 [ info nextcloud.sync.database ./src/common/syncjournaldb.cpp:1798 ]:	Setting blacklist entry for "test.img.gz" 1 "Upload of 316 MB exceeds the max upload size allowed by the server. Reducing max upload size to 158 MB" 1659817086 0 1627988766 "" "" 0

2022-08-06 22:18:30:645 [ warning nextcloud.sync.credentials.webflow ./src/gui/creds/webflowcredentials.cpp:229 ]:	"Error transferring https://nextcloud/remote.php/dav/uploads/user/2750000573/0000000000000000 - server replied: Request Entity Too Large"
2022-08-06 22:18:30:645 [ info nextcloud.sync.networkjob.put ./src/libsync/propagateupload.cpp:87 ]:	PUT of "https://nextcloud/remote.php/dav/uploads/user/2750000573/0000000000000000" FINISHED WITH STATUS "UnknownContentError Server replied \"413 Request Entity Too Large\" to \"PUT https://nextcloud/remote.php/dav/uploads/user/2750000573/0000000000000000\"" QVariant(int, 413) QVariant(QString, "Request Entity Too Large")
2022-08-06 22:18:30:645 [ debug nextcloud.sync.propagator.upload ./src/libsync/propagateupload.cpp:662 ]	[ OCC::PropagateUploadFileCommon::commonErrorHandling ]:	"<html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx/1.22.0</center>\r\n</body>\r\n</html>\r\n"
2022-08-06 22:18:30:646 [ info nextcloud.sync.database ./src/common/syncjournaldb.cpp:1798 ]:	Setting blacklist entry for "Test.ISO" 1 "Upload of 158 MB exceeds the max upload size allowed by the server. Reducing max upload size to 80 MB" 1659817110 0 1418024480 "" "" 0

2022-08-06 22:18:31:467 [ warning nextcloud.sync.credentials.webflow ./src/gui/creds/webflowcredentials.cpp:229 ]:	"Error transferring https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000002 - server replied: Request Entity Too Large"
2022-08-06 22:18:31:467 [ info nextcloud.sync.networkjob.put ./src/libsync/propagateupload.cpp:87 ]:	PUT of "https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000002" FINISHED WITH STATUS "UnknownContentError Server replied \"413 Request Entity Too Large\" to \"PUT https://nextcloud/remote.php/dav/uploads/user/2789530175/0000000000000002\"" QVariant(int, 413) QVariant(QString, "Request Entity Too Large")
2022-08-06 22:18:31:467 [ debug nextcloud.sync.propagator.upload ./src/libsync/propagateupload.cpp:662 ]	[ OCC::PropagateUploadFileCommon::commonErrorHandling ]:	"<html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx/1.22.0</center>\r\n</body>\r\n</html>\r\n"
2022-08-06 22:18:31:467 [ info nextcloud.sync.database ./src/common/syncjournaldb.cpp:1798 ]:	Setting blacklist entry for "test.img.gz" 1 "Upload of 158 MB exceeds the max upload size allowed by the server. Reducing max upload size to 80 MB" 1659817111 0 1627988766 "" "" 0

After 3 attempts uploads succeed.

Hope this is accepted. Feedback is welcome.

@jkellerer jkellerer force-pushed the enh/handle-http-413 branch 5 times, most recently from 67be42c to 9ef12fb Compare August 11, 2022 18:42
@kesselb kesselb added 2. to review enhancement enhancement of a already implemented feature/code labels Aug 26, 2022
@codecov
Copy link

codecov bot commented Aug 26, 2022

Codecov Report

Merging #4826 (9ef12fb) into master (3dc583c) will decrease coverage by 3.61%.
Report is 80 commits behind head on master.
The diff coverage is 95.45%.

❗ Current head 9ef12fb differs from pull request most recent head 1cb6a19. Consider uploading reports for the commit 1cb6a19 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4826      +/-   ##
==========================================
- Coverage   60.79%   57.18%   -3.61%     
==========================================
  Files         145      138       -7     
  Lines       18836    17187    -1649     
==========================================
- Hits        11451     9829    -1622     
+ Misses       7385     7358      -27     
Files Coverage Δ
src/libsync/account.h 52.00% <100.00%> (+9.14%) ⬆️
src/libsync/configfile.cpp 26.25% <100.00%> (-1.78%) ⬇️
src/libsync/owncloudpropagator.cpp 86.32% <100.00%> (+1.79%) ⬆️
src/libsync/propagateuploadng.cpp 83.95% <100.00%> (-1.10%) ⬇️
src/libsync/propagateuploadv1.cpp 69.44% <100.00%> (ø)
src/libsync/syncfileitem.h 38.59% <100.00%> (-4.96%) ⬇️
src/libsync/syncoptions.h 100.00% <100.00%> (ø)
src/libsync/propagateupload.cpp 77.16% <93.33%> (+1.62%) ⬆️
src/libsync/syncoptions.cpp 71.73% <90.90%> (-2.73%) ⬇️

... and 94 files with indirect coverage changes

src/libsync/account.h Outdated Show resolved Hide resolved
src/libsync/account.h Outdated Show resolved Hide resolved
src/libsync/configfile.cpp Outdated Show resolved Hide resolved
src/libsync/propagateupload.cpp Outdated Show resolved Hide resolved
src/libsync/propagateupload.cpp Outdated Show resolved Hide resolved
src/libsync/propagateupload.cpp Outdated Show resolved Hide resolved
@jkellerer
Copy link
Author

Hi @allexzander, are you ok with the updates to your review or do you expect additional changes?

@kesselb
Copy link
Contributor

kesselb commented Sep 17, 2022

Thanks for your pull request 👍

It's a good improvement to handle a 413 response properly. I wonder if it's possible to make this feature work without a new configuration option. When I read the existing code correct we have some calculation to increase the chunk size step by step. Could we store the last working chunk size and fallback to it on 413 response? I guess there is nothing wrong about an additional configuration but we already have 4 for the chunked uploads 🤔

@jkellerer
Copy link
Author

Hi @kesselb, I see your point.

The reason for the config value is primarily to have a starting point for the auto-sensing that doesn't hurt performance too much (since both cases, connection close and HTTP-413 are handled the same for uploads larger failsafeMaxChunkSize).

Guess what can be done is to have a built-in range for failsafeMaxChunkSize (e.g. from 100mb down to 25mb) that is evaluated at the first failures and then remembered in the KV store. If uploads still fail, auto-sensing would go down to minChunkSize but keep this only until the next restart and start from the remembered failsafeMaxChunkSize again.

This ensures that performance cannot get too bad or stay bad when limits are changed again.

Only question is whether this should be reset at some time or stay in the DB forever?

@jkellerer
Copy link
Author

Hi @kesselb, @allexzander,

Have updated the implementation as follows:

  • On client restart _min and _max chunk size applies (always), until the first failure.
  • On the first failure, the remembered _failsafe chunk size applies and is used as start for auto-sensing.
  • The sensed _failsafe chunk size is remembered down to LowestPossibleFailsafeMaxChunkSize (25mb).
  • Lower limits than this are not persisted and always require auto-sensing after a client restart but this ensures a miss-configuration of the server cannot persist values that would hurt upload performance seriously.
  • In addition to _failsafe size, also the _initial chunk size is stored as this eliminates the initial upload with 10m that would otherwise happen with every new sync job.

@jkellerer
Copy link
Author

The latest commit addresses _initial chunk size to be persisted only when chunk size is dynamic (and supported). However I guess this value is better kept as runtime value in account() and probably should be done in a separate PR. For reference I have not removed it.

@allexzander
Copy link
Contributor

@mgallien WDYT?

@jospoortvliet
Copy link
Member

@mgallien as this is also causing clients to fail to sync, perhaps it'd be good to make it part of the fixes for reliable syncing?

@kesselb
Copy link
Contributor

kesselb commented Dec 7, 2022

If it takes more time to finish this pull request, we could also revert #3600

maxChunkSize = 1000 is only possible when the server accepts such a big file.

@sveken
Copy link

sveken commented Jan 20, 2023

If it takes more time to finish this pull request, we could also revert #3600

maxChunkSize = 1000 is only possible when the server accepts such a big file.

reverting it while a better fix is worked on would be appreciated.

@jkellerer
Copy link
Author

Btw. I'm happy to update the PR if anyone can find the time to review it. Nevertheless reducing maxChunkSize in the meantime is a quick and safe fix.

Copy link
Collaborator

@claucambra claucambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, we are really sorry this slipped through the cracks for so long -- this is a fantastic contribution, thank you.

Would you be willing to fix the merge conflicts and make this mergeable again? If not, we are happy to cherry-pick your commits and fix it ourselves

@jkellerer
Copy link
Author

@claucambra, will look into it this week

@jkellerer jkellerer force-pushed the enh/handle-http-413 branch 2 times, most recently from 8ca84ba to 383d52f Compare September 12, 2023 18:25
@jkellerer
Copy link
Author

@claucambra, I'm done from my side. The remaining check failures seem to be unspecific to this PR (are failing for others also).

The PR does not introduce new configuration and does not persist anything. Everything is kept as runtime states in the Account instance and its lifetime. If something should be persisted, it could be done from there but then also the lifecycle would need to be managed which is very likely to complex for this matter.

@asheroto

This comment was marked as off-topic.

@jkellerer

This comment was marked as off-topic.

@asheroto

This comment was marked as off-topic.

Copy link
Collaborator

@claucambra claucambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two very small comments, otherwise it is looking very good

src/libsync/syncoptions.cpp Outdated Show resolved Hide resolved
src/libsync/propagateupload.cpp Show resolved Hide resolved
src/libsync/propagateupload.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@allexzander allexzander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkellerer Windows CI unit test is failing, could you fix it?

@jkellerer
Copy link
Author

@allexzander, could you please run the Windows CI tests one more time to have updated output? Thanks!

@jkellerer
Copy link
Author

Compiled & tested with jkellerer/building-windows but tests don't fail. Need to try harder.

 * Using Account to remember max and last dynamic chunk size
 * Start sensing max upload size at 100mb (on first failure)
 * Handle connection close error like HTTP-413 if upload-size > 100mb
 * Updated tests that relied on boundless SyncOptions

Signed-off-by: Juergen Kellerer <[email protected]>
@nextcloud-desktop-bot
Copy link

AppImage file: nextcloud-PR-4826-09be2f7e8139a39a115396d77a8cf666b548bcb4-x86_64.AppImage

To test this change/fix you can simply download above AppImage file and test it.

Please make sure to quit your existing Nextcloud app and backup your data.

@l4kr

This comment was marked as resolved.

@vithusel

This comment was marked as resolved.

@jospoortvliet
Copy link
Member

Hi team, what is needed to get this to the finishing line & get it merged?

@jkellerer
Copy link
Author

jkellerer commented Dec 10, 2023

@jospoortvliet, missing are fixes for unit tests. In the test system I had access to: Win Server 2016 (x64), MacOS (x64), I had no test failures (for the parts that are supported, Win2016 seems missing some parts, but the test that failed here was working fine).

Another Win11 instance created some other issues but that ran on ARM64, so not the most reliable result. To be honest I wasn't so sure that the changes made here are affecting the test failures as the results vary quite a bit depending on the used OS.

@Hyp3rSoniX
Copy link

Hyp3rSoniX commented Mar 30, 2024

Is it possible to revive this PR?

I had to fiddle around the chunk size config on my fresh new system to get it to work. It seems like this PR would modify the parameters on the fly dynamically, which could solve most such problems where the reverse proxy or whatever doesn't like the ootb chunk size.

Adding the following configs helped me in my current case on mac:

chunkSize=10000000
minChunkSize=1000000
maxChunkSize=50000000
targetChunkUploadDuration=6000

@JasperTheMinecraftDev
Copy link

Up! Please take a look at this team!

@pktiuk
Copy link

pktiuk commented May 20, 2024

@mgallien @allexzander Could you look at this PR? This fix is waiting too long.

@codefaux

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. to review enhancement enhancement of a already implemented feature/code feature: 🔄 sync engine hotspot: chunk sizing Chunk sizing defaults, fallback behavior, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Connection closed" message when syncing files larger then +- 100Mb