Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] Dropping a fileset schema as a cascade misses dropping child filesets #4842

Open
mchades opened this issue Sep 3, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@mchades
Copy link
Contributor

mchades commented Sep 3, 2024

Version

main branch

Describe what's wrong

In current dropSchema codes, when we drop a fileset schema with cascade, it only judges the schemaPath's subfile but missing the child fileset:

if (fs.listStatus(schemaPath).length > 0 && !cascade) {
throw new NonEmptySchemaException(
"Schema %s with location %s is not empty", ident, schemaPath);
} else {
fs.delete(schemaPath, true);
}
LOG.info("Deleted schema {} location {}", ident, schemaPath);
return true;

Error message and/or stacktrace

don't throw an error

How to reproduce

  • create a fileset schema s1 with location /a
  • create a fileset s1.f1 under above schema with location /b
  • drop the schema s1 with cascade
  • the /a was removed but /b still exists (that's the problem)

Additional context

No response

@mchades mchades added the bug Something isn't working label Sep 3, 2024
@mchades
Copy link
Contributor Author

mchades commented Sep 3, 2024

The current fileset schema cascade deletion only applies to the location of the schema.

I am not sure if we need to consider the sub-files under the location of the schema during cascade deletion, or just need to consider the child-filesets' locations of the schema? Or do we need to consider both?

Do you have any suggestions? @jerryshao @xloya @coolderli

@xloya
Copy link
Collaborator

xloya commented Sep 3, 2024

From the current scenario within Xiaomi, we will not directly Drop the schema, but only support Drop the fileset. If we need to support Drop Schema, it seems reasonable to delete the storage location of managed fileset under the schema at the same time, but the cost may be slightly higher because recursive storage location deletion is required. Maybe disable that deleting schema of fileset recursively which has the managed filesets is also an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants