-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug(go client):The cluster added a new meta node, but the meta server configuration of the go client was not updated. As a result, the client cannot find the new meta address and can only access the meta listed in the meta list #1880
Labels
type/bug
This issue reports a bug.
Comments
Good point, could you please submit a patch to fix it? |
OK, get it. |
empiredan
pushed a commit
that referenced
this issue
Jun 20, 2024
…to primary meta server if it was changed (#1916) #1880 #1856 As for #1856: when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem. In this pr, the client would update configuration of table automatically when someone replica core dump. After testing, we found that the the replica error is "context.DeadlineExceeded" (incubator-pegasus/go-client/pegasus/table_connector.go) when the replica core dump. Therefore, when client meets the error, the go client will update configuration automatically. Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries. Therefore, it is better to directly return the request error to the user and let the user try again. As for #1880: When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return. According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way. About tests: 1. Start onebox, and the primary meta server is not added to the go client configuration. 2. The go client writes data to a certain partition and then kills the replica process.
ruojieranyishen
pushed a commit
to ruojieranyishen/incubator-pegasus
that referenced
this issue
Jul 17, 2024
…to primary meta server if it was changed (apache#1916) apache#1880 apache#1856 As for apache#1856: when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem. In this pr, the client would update configuration of table automatically when someone replica core dump. After testing, we found that the the replica error is "context.DeadlineExceeded" (incubator-pegasus/go-client/pegasus/table_connector.go) when the replica core dump. Therefore, when client meets the error, the go client will update configuration automatically. Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries. Therefore, it is better to directly return the request error to the user and let the user try again. As for apache#1880: When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return. According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way. About tests: 1. Start onebox, and the primary meta server is not added to the go client configuration. 2. The go client writes data to a certain partition and then kills the replica process.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Assuming the Pegasus client is configured with a meta server list of "127.0.0.1:34602" and "127.0.0.1:34603," but the actual primary meta server for the Pegasus server is "127.0.0.1:34601," the Pegasus client will not be able to connect to the Pegasus server until a timeout occurs.
The reason is that when the go client searches for the primary, it iterates through the meta server list, sending an RPC RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX to each meta server and making a determination based on the response.
Unlike the Java client, the go client cannot directly use indirection to add meta servers not specified in the configuration to the client.
Below is the logic code for this part of the go client.
Here is the relevant part of the Java client code for this:
In summary
The primary impact of this issue is that, in the online cluster, a new meta server was added, and at some point thereafter, this meta server became the primary. Users, without changing their configurations, are unable to connect to the server.
The text was updated successfully, but these errors were encountered: