You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We see mm container logs where a thread (here model-load-5e5db6cc) that is loading a model is triggering evacuation of all (or most of) the loaded models.
The evacuation are all triggered in the same millisecond.
The evacutaion triggers are followed by a warning log:
Entire cache capacity of 1835008 units (14336MiB) is now taken up by removed models that are still unloading
The size of the model that we load is 1G size as the loaded models - it should not require unloading so many loaded models.
We are trying to follow the code in ModelMesh.java between the line that sets the thread name (curThread.setName("model-load-" + modelId)) and the log that reports that we are starting to load the model (logger.info("Starting load for model " + modelId + " type=" + modelType)) to understand what triggers the loaded models evacuation.
We'd like to know how did modelmesh decide that it should evacuate so many models and where is this happening in the code.
To Reproduce
We don't have a reproducible way to get this issue but it happens quite often in our cluster.
The issue seems to happen when the GPU memory is loaded with the max number of models it can carry and then we try to load an additional model.
Expected behavior
At most one or two models should be unloaded if space is required to load an additional model with the same characteristics as the loaded model.
Screenshots
The Kibana logs
Environment:
We are using version 0.11.0 and run on g4dn.xlarge instance
The text was updated successfully, but these errors were encountered:
Issue Description
We see mm container logs where a thread (here model-load-5e5db6cc) that is loading a model is triggering evacuation of all (or most of) the loaded models.
The evacuation are all triggered in the same millisecond.
The evacutaion triggers are followed by a warning log:
The size of the model that we load is 1G size as the loaded models - it should not require unloading so many loaded models.
We are trying to follow the code in ModelMesh.java between the line that sets the thread name (
curThread.setName("model-load-" + modelId)
) and the log that reports that we are starting to load the model (logger.info("Starting load for model " + modelId + " type=" + modelType)
) to understand what triggers the loaded models evacuation.We'd like to know how did modelmesh decide that it should evacuate so many models and where is this happening in the code.
To Reproduce
We don't have a reproducible way to get this issue but it happens quite often in our cluster.
The issue seems to happen when the GPU memory is loaded with the max number of models it can carry and then we try to load an additional model.
Expected behavior
At most one or two models should be unloaded if space is required to load an additional model with the same characteristics as the loaded model.
Screenshots
The Kibana logs
Environment:
We are using version 0.11.0 and run on g4dn.xlarge instance
The text was updated successfully, but these errors were encountered: