-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fs] basic sync tool #14248
base: main
Are you sure you want to change the base?
[fs] basic sync tool #14248
Commits on Feb 27, 2024
-
CHANGELOG: Introduce `hailctl fs sync` which robustly transfers one or more files between Amazon S3, Azure Blob Storage, and Google Cloud Storage. There are really two distinct conceptual changes remaining here. Given my waning time available, I am not going to split them into two pull requests. The changes are: 1. `basename` always agrees with the the [`basename` UNIX utility](https://en.wikipedia.org/wiki/Basename). In particular, the folder `/foo/bar/baz/`'s basename is *not* `''` it is `'baz'`. The only folders or objects whose basename is `''` are objects whose name literally ends in a slash, e.g. an *object* named `gs://foo/bar/baz/`. 2. `hailctl fs sync`, a robust copying tool with a user-friendly CLI. `hailctl fs sync` comprises two pieces: `plan.py` and `sync.py`. The latter, `sync.py` is simple: it delegates to our existing copy infrastructure. That copy infastructure has been lightly modified to support this use-case. The former, `plan.py`, is concurrent file system `diff`. `plan.py` generates and `sync.py` consumes a "plan folder" containing these files: 1. `matches` files whose names and sizes match. Two columns: source URL, destination URL. 2. `differs` files or folders whose names match but either differ in size or differ in type. Four columns: source URL, destination URL, source state, destination state. The states are either: `file`, `dif`, or a size. If either state is a size, both states are sizes. 3. `srconly` files only present in the source. One column: source URL. 4. `dstonly` files only present in the destination. One column: destination URL. 5. `plan` a proposed set of object-to-object copies. Two columns: source URL, destination URL. 6. `sumary` a one-line file containing the total number of copies in plan and the total number of bytes which would be copied. As described in the CLI documentation, the intended use of these commands is: ``` hailctl fs sync --make-plan plan1 --copy-to gs://gcs-bucket/a s3://s3-bucket/b hailctl fs sync --use-plan plan1 ``` The first command generates a plan folder and the second command executes the plan. Separating this process into two commands allows the user to verify what exactly will be copied including the exact destination URLs. Moreover, if `hailctl fs sync --use-plan` fails, the user can re-run `hailctl fs sync --make-plan` to generate a new plan which will avoid copying already successfully copied files. Moreover, the user can re-run `hailctl fs sync --make-plan` to verify that every file was indeed successfully copied. Testing. This change has a few sync-specific tests but largely reuses the tests for `hailtop.aiotools.copy`. Future Work. Propagating a consistent kind of hash across all clouds and using that for detecting differences is a better solution than the file-size based difference used here. If all the clouds always provided the same type of hash value, this would be trivial to add. Alas, at time of writing, S3 and Google both support CRC32C for every blob (though, in S3, you must explicitly request it at object creation time), but *Azure Blob Storage does not*. ABS only supports MD5 sums which Google does not support for multi-part uploads.
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 8621664 - Browse repository at this point
Copy the full SHA 8621664View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 86b5a4b - Browse repository at this point
Copy the full SHA 86b5a4bView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 8fbc0f2 - Browse repository at this point
Copy the full SHA 8fbc0f2View commit details -
use recursive=True for rapid listing
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 861ca13 - Browse repository at this point
Copy the full SHA 861ca13View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 7660033 - Browse repository at this point
Copy the full SHA 7660033View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for f956f0a - Browse repository at this point
Copy the full SHA f956f0aView commit details -
allow isdir without trailing slash
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 5b0c8a9 - Browse repository at this point
Copy the full SHA 5b0c8a9View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 46b1b34 - Browse repository at this point
Copy the full SHA 46b1b34View commit details -
update listeners before return
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 55c83e0 - Browse repository at this point
Copy the full SHA 55c83e0View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for eac543c - Browse repository at this point
Copy the full SHA eac543cView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 2bc9d1d - Browse repository at this point
Copy the full SHA 2bc9d1dView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 9588f91 - Browse repository at this point
Copy the full SHA 9588f91View commit details -
maybe get InsertObjectStream right
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for b905f30 - Browse repository at this point
Copy the full SHA b905f30View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for c4162cb - Browse repository at this point
Copy the full SHA c4162cbView commit details -
use async with instead of async with await
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for c8388b1 - Browse repository at this point
Copy the full SHA c8388b1View commit details -
smaller part size maybe helps?
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 527488f - Browse repository at this point
Copy the full SHA 527488fView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for c8891ec - Browse repository at this point
Copy the full SHA c8891ecView commit details -
use order of magnitude less file parallelism than partition parallelism
Dan King committedFeb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 0e6ae95 - Browse repository at this point
Copy the full SHA 0e6ae95View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 04448ed - Browse repository at this point
Copy the full SHA 04448edView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 22ef264 - Browse repository at this point
Copy the full SHA 22ef264View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 9a43627 - Browse repository at this point
Copy the full SHA 9a43627View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for f5a7af2 - Browse repository at this point
Copy the full SHA f5a7af2View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for e6fffd7 - Browse repository at this point
Copy the full SHA e6fffd7View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 470884d - Browse repository at this point
Copy the full SHA 470884dView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 01e0759 - Browse repository at this point
Copy the full SHA 01e0759View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 52fd7b6 - Browse repository at this point
Copy the full SHA 52fd7b6View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for e5ce4d6 - Browse repository at this point
Copy the full SHA e5ce4d6View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 279dcfa - Browse repository at this point
Copy the full SHA 279dcfaView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for d1ef63c - Browse repository at this point
Copy the full SHA d1ef63cView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 41d9a9a - Browse repository at this point
Copy the full SHA 41d9a9aView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 927ad35 - Browse repository at this point
Copy the full SHA 927ad35View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 8e0d44c - Browse repository at this point
Copy the full SHA 8e0d44cView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for cab8b8f - Browse repository at this point
Copy the full SHA cab8b8fView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 4a848b4 - Browse repository at this point
Copy the full SHA 4a848b4View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for f89a9e5 - Browse repository at this point
Copy the full SHA f89a9e5View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 188fff7 - Browse repository at this point
Copy the full SHA 188fff7View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 9ef4b32 - Browse repository at this point
Copy the full SHA 9ef4b32View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 7086648 - Browse repository at this point
Copy the full SHA 7086648View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 2c9d79d - Browse repository at this point
Copy the full SHA 2c9d79dView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 6268348 - Browse repository at this point
Copy the full SHA 6268348View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for edbf8cd - Browse repository at this point
Copy the full SHA edbf8cdView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for c9b07de - Browse repository at this point
Copy the full SHA c9b07deView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 7aa4a5a - Browse repository at this point
Copy the full SHA 7aa4a5aView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 2a886aa - Browse repository at this point
Copy the full SHA 2a886aaView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 7fc7003 - Browse repository at this point
Copy the full SHA 7fc7003View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for e564b5d - Browse repository at this point
Copy the full SHA e564b5dView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 40f96c5 - Browse repository at this point
Copy the full SHA 40f96c5View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 4c69ec6 - Browse repository at this point
Copy the full SHA 4c69ec6View commit details -
Configuration menu - View commit details
-
Copy full SHA for e3425e1 - Browse repository at this point
Copy the full SHA e3425e1View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for eeb861f - Browse repository at this point
Copy the full SHA eeb861fView commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for ba77330 - Browse repository at this point
Copy the full SHA ba77330View commit details -
Dan King committed
Feb 27, 2024 Configuration menu - View commit details
-
Copy full SHA for 44c9364 - Browse repository at this point
Copy the full SHA 44c9364View commit details
Commits on Feb 28, 2024
-
Dan King committed
Feb 28, 2024 Configuration menu - View commit details
-
Copy full SHA for f6eacea - Browse repository at this point
Copy the full SHA f6eaceaView commit details -
Merge remote-tracking branch 'hi/main' into new-new-copier
Dan King committedFeb 28, 2024 Configuration menu - View commit details
-
Copy full SHA for f502b8f - Browse repository at this point
Copy the full SHA f502b8fView commit details -
[uvloopx] consolidate uvloop initialization code to one place
Dan King committedFeb 28, 2024 Configuration menu - View commit details
-
Copy full SHA for 2bd85ec - Browse repository at this point
Copy the full SHA 2bd85ecView commit details -
Dan King committed
Feb 28, 2024 Configuration menu - View commit details
-
Copy full SHA for 4c05471 - Browse repository at this point
Copy the full SHA 4c05471View commit details -
revert unnecsesary changes to copy and copier
Dan King committedFeb 28, 2024 Configuration menu - View commit details
-
Copy full SHA for 4bba249 - Browse repository at this point
Copy the full SHA 4bba249View commit details
Commits on Feb 29, 2024
-
add assertion about total size and also fix lint about unused variable
Dan King committedFeb 29, 2024 Configuration menu - View commit details
-
Copy full SHA for 7ae6107 - Browse repository at this point
Copy the full SHA 7ae6107View commit details
Commits on Jun 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 38727c9 - Browse repository at this point
Copy the full SHA 38727c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9c25bc0 - Browse repository at this point
Copy the full SHA 9c25bc0View commit details
Commits on Aug 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1a33173 - Browse repository at this point
Copy the full SHA 1a33173View commit details
Commits on Aug 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 44ef794 - Browse repository at this point
Copy the full SHA 44ef794View commit details
Commits on Aug 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7087fc0 - Browse repository at this point
Copy the full SHA 7087fc0View commit details
Commits on Aug 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5a32d23 - Browse repository at this point
Copy the full SHA 5a32d23View commit details
Commits on Aug 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c66533f - Browse repository at this point
Copy the full SHA c66533fView commit details
Commits on Sep 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a447e22 - Browse repository at this point
Copy the full SHA a447e22View commit details