Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Add a generic lock mechanism in Graviton #143

Closed
jerryshao opened this issue Jul 28, 2023 · 4 comments
Closed

[Improvement] Add a generic lock mechanism in Graviton #143

jerryshao opened this issue Jul 28, 2023 · 4 comments
Assignees
Labels
improvement Improvements on everything

Comments

@jerryshao
Copy link
Contributor

jerryshao commented Jul 28, 2023

What would you like to be improved?

Currently, we lack the lock mechanism in Graviton, which will potentially make some operations inconsistent, for example:

  • The alterTable operation will load the table first and update to the underlying storage.
  • The various operations that needs to both manipulate the external metadata source and underlying storage.

Currently, The code will potentially be in race condition, we should have a way to fix this issue.

How should we improve?

We may need to have a lock mechanism to solve this problem. Also, be aware that the lock mechanism should consider that we may have multiple Graviton services, so a process-level lock may not be enough.

Besides, currently we rely on a transaction mechanism to achieve this, it might be an issue for some storage systems like fdb (it has a short transaction timeout requirement). For some operations like Hive operations, it might take a long time to get the results.

@jerryshao jerryshao added the improvement Improvements on everything label Jul 28, 2023
@jerryshao jerryshao assigned jerryshao and unassigned jerryshao Jul 28, 2023
@jerryshao
Copy link
Contributor Author

@yuqi1129 can you please check this issue, is it necessary to fix, and how to achieve this?

@yuqi1129
Copy link
Contributor

yuqi1129 commented Aug 7, 2023

Got

@yuqi1129
Copy link
Contributor

yuqi1129 commented Aug 8, 2023

@jerryshao

I think we need to split this issue into two:

  • Support transactions between inner operation like updating underlying storage and outer operation like updating hive metastore
  • Support global lock between different graviton instances

Frist one

To support transactions between inner operation and outer operation, Firstly, we need to confirm that

  • Both outer-operation and inner-operation support atomic operation separately and support roll all back if any steps failed.
  • We have a mechanism to coordinate that outer operation and inner operation can be executed in a transactional way. that is, if one of them failed, we can roll all back.

If we can confirm the above two points, we can implement this feature.

Second one

For the second one, To support global lock between different graviton instances, we may need to introduce a new component to support this feature. For example,

  • We can use Zookeeper to implement a distributed lock.
  • Deploying a Redis cluster to support distributed lock
  • Others that can support distributed lock

But centralized distributed locks are not very good for our case, As it should be deployed globally or distributed over the globe, it may introduce a lot of network latencies.
So we may need to consider other solutions. Till now, I have no good idea about this. But we may get some hints from the concept of global clock in the distributed system.

@jerryshao
Copy link
Contributor Author

This is dup of #407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

No branches or pull requests

2 participants