Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teraslice] New Feature: Add new state "relocating" to execution controller status #3762

Open
sotojn opened this issue Sep 18, 2024 · 0 comments
Assignees
Labels
enhancement feature k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice

Comments

@sotojn
Copy link
Contributor

sotojn commented Sep 18, 2024

We've introduced a new feature in the kubernetesV2 backend that allows for execution controller pods to be relocated in the event a pod is deleted or a kubectl drain is called in the node. The issue is that in the case on an OOM on the execution controller, we leave ourselves open to a crashbackoffloop because we check to see if the execution initializes in a running state.

if (includes(terminalStatuses, status)) {
error = new Error(invalidStateMsg('terminal'));
} else if (includes(runningStatuses, status)) {
// In the case of a running status on startup we
// want to continue to start up. Only in V2.
// Right now we will depend on kubernetes `crashloopbackoff` in the case of
// an unexpected exit to the ex process. Ex: an OOM
// NOTE: If this becomes an issue we may want to add a new state. Maybe `interrupted`
if (this.context.sysconfig.teraslice.cluster_manager_type === 'kubernetesV2') {
// Check to see if `isRestartable` exists.
// Allows for older assets to work with k8sV2
if (this.executionContext.slicer().isRestartable) {
this.logger.info(`Execution ${this.exId} detected to have been restarted..`);
const restartable = this.executionContext.slicer().isRestartable();
if (restartable) {
this.logger.info(`Execution ${this.exId} is restarable and will continue reinitializing...`);
} else {
this.logger.error(`Execution ${this.exId} is not restarable and will shutdown...`);
}
return restartable;
}
}

We can fix this by adding a new state called relocating which we can set in the scenario we are expecting to relocate here.

/// This only applies to kubernetesV2
if (
this.context.sysconfig.teraslice.cluster_manager_type === 'kubernetesV2'
&& eventType === 'SIGTERM'
) {
await this.stateStorage.refresh();
const status = await this.executionStorage.getStatus(this.exId);
const runningStatuses = this.executionStorage.getRunningStatuses();
this.logger.debug(`Execution ${this.exId} is currently in a ${status} state`);
/// This is an indication that the cluster_master did not call for this
/// shutdown. We want to restart in this case.
if (status !== 'stopping' && includes(runningStatuses, status)) {
this.logger.info('Skipping shutdown to allow restart...');
return;
}
}

We would need to make sure we handle this new state in all the cases that we have logic based around what state the execution controller is in.

@sotojn sotojn added enhancement k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice feature labels Sep 18, 2024
@sotojn sotojn self-assigned this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice
Projects
None yet
Development

No branches or pull requests

1 participant