Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Add CLI interface for Gravitino #4943

Open
10 tasks
jerryshao opened this issue Sep 13, 2024 · 15 comments
Open
10 tasks

[EPIC] Add CLI interface for Gravitino #4943

jerryshao opened this issue Sep 13, 2024 · 15 comments
Labels
epic Key feature

Comments

@jerryshao
Copy link
Contributor

jerryshao commented Sep 13, 2024

Describe the proposal

Currently, Gravitino provides web UI, Java SDK, Python SDK to manipulate the metadata. The current web UI is not a full functionality UI, user still needs to write Java, Python codes or directly use REST APIs to communicate with Gravitino, which makes user hard to use Gravitino at the first stage.

Instead of adding more and more features in web UI, we think that adding a simple CLI interface will significantly help the users, especially the developers. So in this EPIC issue, we are planning to the CLI support for Gravitino.

Task list

@jerryshao jerryshao added the epic Key feature label Sep 13, 2024
@jerryshao
Copy link
Contributor Author

@justinmclean , can you please post the design doc here, so we can we have a discussion on the issue.

@justinmclean
Copy link
Member

justinmclean commented Sep 16, 2024

@jerryshao
Copy link
Contributor Author

I think it is simple to use the current Java Gravitino to create a CLI tool. Current Python client only supports Fileset, if you want to use Python, then you have to write all the rest protocols from ground.

Also, for the CLI arguments, can you please investigate some other similar projects to see how they design and how we can refer?

Besides, I think the design doc should list all the commands about list, create, get, alter, and delete for all the entities, also should include access control operations and tag operations, to achieve them in CLI also.

@jerryshao

This comment was marked as outdated.

@justinmclean
Copy link
Member

justinmclean commented Sep 20, 2024

The above isn't using the Python client. It's using the REST interface as indicated by "The CLI would be implemented in Python using requests library for the REST interface and click for the CLI." - this is what several other catalogs do.

@jerryshao
Copy link
Contributor Author

Then you have to implement all the REST interfaces including metalake, catalog, schema, table..., also including authorization, this is a huge work, why can't we use the existing Java client and do a simple wrapper to achieve the CLI?

@justinmclean
Copy link
Member

justinmclean commented Sep 20, 2024

The entire basic REST interface in Python would be about 100 lines of code, much less than the equilivant Java code.

@jerryshao
Copy link
Contributor Author

I don't think so, once you implement some complicated "create" or "alter" command, you will have to handle complicated cases including serialization and deserialization. Also, in Java client we already achieve some authorization methods like Kerberos, oauth, etc. If you want to write them in plain using Python, you will have to deal with them. It's not as simple as you think.

@justinmclean
Copy link
Member

justinmclean commented Sep 20, 2024

I have also looked at several other CLIs and the design above is similar to how they do it. My initial thought was not to implement the entire API/REST interface, but it can be extended once we we have something that is useful.

@jerryshao
Copy link
Contributor Author

If you're not doing a complete CLI, then there's no difference compared to the current web UI, users still cannot fully experience the whole features easily.

@justinmclean
Copy link
Member

The initial objective (as described above) is not to implement the full API, but we can expand on the initial API to eventually cover everything. If you want me to break it up into stages and put what is developed in each, I can do that.

@jerryshao
Copy link
Contributor Author

My feeling is that if we use Java client for CLI, then we already handle most of the of the JSON serde and security things, so we can only focus on CLI implementation. But if we choose to use Python, we will need to implement from the ground, this may take lots of work, especially since we have several data structures like "type", and "expression", they're nested and complicated to serialize/deserialize. If we choose Java, the current Java client already did it for us, so we don't have to do it again. The key thing is that for CLI, we don't have to do the JSON/REST thing again, we can leverage the current client and focus on the CLI thing only. It's not a problem of choosing languages, it's a problem of not doing duplicated work. If there's a full-functionality Python client, then I'm also fine with a simple wrapper of that Python client to achieve CLI in Python.
It's just my comment, I think we should discuss this in the community and involve others' opinions, I will also post this on the issue.

@jerryshao
Copy link
Contributor Author

jerryshao commented Sep 20, 2024

The initial objective (as described above) is not to implement the full API, but we can expand on the initial API to eventually cover everything. If you want me to break it up into stages and put what is developed in each, I can do that.

This is a huge task, it should be broken into multiple small tasks to achieve them step by step.

@mchades
Copy link
Contributor

mchades commented Sep 20, 2024

Link to design doc: https://docs.google.com/document/d/19CXHeg_5iphO8D3UD16rexVE23M4TMZXOU3X0lBqL6A/edit?usp=sharing

IMO, the current design doesn't seem to be very different from using curl directly.

As a CLI, I think it's more about user interaction, and results presentation, but the current CLI seems to be more of a simplification of the use of the curl tool, and doesn't bring a lot of convenience to the user.

Feel free to point out if I've misunderstood anything!

@justinmclean
Copy link
Member

justinmclean commented Sep 20, 2024

It is easier to use than curl as the user doesn't need to construct correct REST URLs or supply correctly formatted JSON. It fills in many of the default parameters, and its output is more user-friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Key feature
Projects
None yet
Development

No branches or pull requests

3 participants