Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent empty commits if files did not change #809

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

Pierrci
Copy link
Member

@Pierrci Pierrci commented Jul 17, 2024

Cousin of huggingface/huggingface_hub#2389

Do I need to add tests somewhere? x)

(disclaimer: I obviously haven't tried it yet, but theoretically, it should work!)

@Pierrci Pierrci requested a review from coyotte508 as a code owner July 17, 2024 23:58

if (isFrontend) {
yield 1;
return null; // unsupported if we're here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep things simple, and it's not really a problem IMO, it just means that, in this case, we'll always upload non-LFS files

export async function* sha1(buffer: Blob, opts?: { abortSignal?: AbortSignal }): AsyncGenerator<number, string | null> {
yield 0;

if (globalThis.crypto?.subtle) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no threshold like for sha256 since non-LFS files are always < 10MB (which is the threshold for sha256 IIUC)

const allOpsToCommit = [...deleteOps, ...fileOpsToCommit];
if (allOpsToCommit.length === 0) {
const res: Response = await (params.fetch ?? fetch)(
`${params?.hubUrl || HUB_URL}/api/${repoId.type}s/${repoId.name}`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there doesn't seem to be any method already to call this endpoint, correct?

Copy link
Member

@coyotte508 coyotte508 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should create one - in another PR - with expand handling too (getModelInfo, ...)

But here I think we should include the sha directly in the preupload response. Given we give already oid for individual files, and those are the oid in the context a specific commit from the remote, we should add the sha in the preupload response.

That way - no need for all this extra code here - cc @Wauplin as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include the sha directly in the preupload response

That would simplifies things indeed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 325d917

packages/hub/src/lib/commit.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look at the PR but haven't tried it myself either 😄

packages/hub/src/lib/commit.ts Outdated Show resolved Hide resolved
packages/hub/src/utils/sha-node.ts Show resolved Hide resolved
packages/hub/src/lib/commit.ts Outdated Show resolved Hide resolved
@coyotte508
Copy link
Member

coyotte508 commented Jul 18, 2024

can we maybe add a preventEmptyCommit in the body of the commit endpoint (or query param, whatever) and let the backend throw an error in that case?

@Pierrci
Copy link
Member Author

Pierrci commented Jul 18, 2024

@coyotte508 that would work too, yes, but it would have been nice to hear from you when we discussed that internally haha; now we already have huggingface/huggingface_hub#2389 and https://github.com/huggingface-internal/moon-landing/pull/10547, but if we do that we won't need those anymore, correct? :)

@coyotte508
Copy link
Member

coyotte508 commented Jul 18, 2024

@coyotte508 that would work too, yes, but it would have been nice to hear from you when we discussed that internally haha; now we already have huggingface/huggingface_hub#2389 and huggingface-internal/moon-landing#10547, but if we do that we won't need those anymore, correct? :)

indeed 😬

I guess the backend solution would have been to add a gitaly hook conditionally skipped to prevent empty commits

@coyotte508
Copy link
Member

Do I need to add tests somewhere? x)

You can add a test to commit.spec.ts

@Wauplin
Copy link
Contributor

Wauplin commented Jul 18, 2024

can we maybe add a preventEmptyCommit in the body of the commit endpoint (or query param, whatever) and let the backend throw an error in that case?

I actually prefer to enforce the preventEmptyCommit, at least in the Python client. The problem if we add a new parameter prevent_empty_commit: bool when creating a commit is that it'll never be used/propagated to downstream libraries. If we set a default value True, then it's similar to not having a parameter. If we set a default value False, then it's similar to not doing anything at all.

A bit the same for raising errors. That would break some workflows that haven't asked for anything. Silently preventing empty commits (with a log message) is the best compromise I think. TL;DR: no one cares expect us, right?

that may or may not work, in any case it's timing out for sure
@Pierrci
Copy link
Member Author

Pierrci commented Jul 19, 2024

@coyotte508 I added a test, but it seems it's timing out, along with others, any tip regarding this? (does that mean I broke them? :))

@coyotte508
Copy link
Member

coyotte508 commented Jul 20, 2024

well yes you probably broke them :)

(they ran fine in https://github.com/huggingface/huggingface.js/actions/runs/9991710454/job/27615066628)

@Pierrci
Copy link
Member Author

Pierrci commented Jul 23, 2024

it's greeeeeeeen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested nothing is broken by renaming this file - https://huggingface.co/spaces/coyotte508/hub-sha256 - and it seems fine

Copy link
Member

@coyotte508 coyotte508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

packages/hub/src/types/api/api-commit.ts Outdated Show resolved Hide resolved
const allOpsToCommit = [...deleteOps, ...fileOpsToCommit];
if (allOpsToCommit.length === 0) {
const res: Response = await (params.fetch ?? fetch)(
`${params?.hubUrl || HUB_URL}/api/${repoId.type}s/${repoId.name}`,
Copy link
Member

@coyotte508 coyotte508 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should create one - in another PR - with expand handling too (getModelInfo, ...)

But here I think we should include the sha directly in the preupload response. Given we give already oid for individual files, and those are the oid in the context a specific commit from the remote, we should add the sha in the preupload response.

That way - no need for all this extra code here - cc @Wauplin as well

@@ -459,6 +467,38 @@ export async function* commitIter(params: CommitParams): AsyncGenerator<CommitPr

yield { event: "phase", phase: "committing" };

const allOpsToCommit = [...deleteOps, ...fileOpsToCommit];
if (allOpsToCommit.length === 0) {
if (!currentCommitOid) {
Copy link
Member Author

@Pierrci Pierrci Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we currently don't return early or throw an error in the case where operations is empty, meaning it's possible to get there without calling the preupload endpoint before, and thus currentCommitOid can still be undefined here.

So I kept this for now; the other alternative would be to return an object with a different different shape or throw an error if there are no operations, but it would be breaking change - I'll let you decide @coyotte508

Copy link
Member

@coyotte508 coyotte508 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to throw on empty commits IMO, we can do a major version change (and maybe do some of https://github.com/huggingface/huggingface.js/milestone/2 before publishing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually if we upgrade to 1.0 we can add the null return type and return null on empty commit

(And it's better than throwing an error, at least it will be caught by TS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants