-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add context to handler functions #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I added a comment regarding the implementation.
def run_handler_function(self, func, *argv): | ||
"""Helper to call the handler function which covers 2 cases: | ||
1. the handle function takes context | ||
2. the handle function does not take context | ||
""" | ||
num_func_input = len(signature(func).parameters) | ||
if num_func_input == len(argv): | ||
# function does not take context | ||
result = func(*argv) | ||
elif num_func_input == len(argv) + 1: | ||
# function takes context | ||
argv_context = argv + (self.context,) | ||
result = func(*argv_context) | ||
else: | ||
raise TypeError( | ||
"{} takes {} arguments but {} were given.".format(func.__name__, num_func_input, len(argv)) | ||
) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that needed? What overhead does it add to an request/function call?.
Why don't we just past self.context
in the handle
method in line
Line 234 in 44e3dec
response = self.transform_fn(self.model, input_data, content_type, accept) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @philschmid . Thank you for your review. Maybe I am not understanding your point correctly, but I don't think we can directly pass self.context
in the handle
method. If we do that, don't we risk breaking existing customers who are not using context
as an input argument? With the above mentioned run_handler_function
, we should be able to support both customers who do not want to use context
as well as customers who want to use context
. Please correct me if I misunderstood. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it break it? you mean if they have a custom inference.py
that defines a transform_fn
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we might affect those who define a custom transform_fn
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In handler_service.py
, we are trying to import user defined functions from inference.py
load_fn = getattr(user_module, "model_fn", None)
preprocess_fn = getattr(user_module, "input_fn", None)
predict_fn = getattr(user_module, "predict_fn", None)
postprocess_fn = getattr(user_module, "output_fn", None)
transform_fn = getattr(user_module, "transform_fn", None)
If existence, it will be used to overwrite self.load
, self.preprocess
, self.predict
, self.postprocess
and self.transform_fn
if load_fn is not None:
self.load = load_fn
if preprocess_fn is not None:
self.preprocess = preprocess_fn
if predict_fn is not None:
self.predict = predict_fn
if postprocess_fn is not None:
self.postprocess = postprocess_fn
if transform_fn is not None:
self.transform_fn = transform_fn
In this PR, we introduce additional parameter context
for default handlers. However, for existing user scripts, they don't have this parameter when they implement customized handler function. We have to be careful when we call these functions. That's why @sachanub is adding a helper function to determine when to call functions with the new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something we could deprecate and remove with a v3 of the toolkit? Feels like very error prone or a big overhead to parse and add args like this for every incoming request.
What do you think of adding a check to see if there is a inference.py
provided and if not we are using self-transform
directly? Most customer deploy models using the "zero-code" deployment feature, where you provide a MODEL_ID
and TASK
and don't need and inference.py
script.
def run_handler_function(self, func, *argv): | ||
"""Helper to call the handler function which covers 2 cases: | ||
1. the handle function takes context | ||
2. the handle function does not take context | ||
""" | ||
num_func_input = len(signature(func).parameters) | ||
if num_func_input == len(argv): | ||
# function does not take context | ||
result = func(*argv) | ||
elif num_func_input == len(argv) + 1: | ||
# function takes context | ||
argv_context = argv + (self.context,) | ||
result = func(*argv_context) | ||
else: | ||
raise TypeError( | ||
"{} takes {} arguments but {} were given.".format(func.__name__, num_func_input, len(argv)) | ||
) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In handler_service.py
, we are trying to import user defined functions from inference.py
load_fn = getattr(user_module, "model_fn", None)
preprocess_fn = getattr(user_module, "input_fn", None)
predict_fn = getattr(user_module, "predict_fn", None)
postprocess_fn = getattr(user_module, "output_fn", None)
transform_fn = getattr(user_module, "transform_fn", None)
If existence, it will be used to overwrite self.load
, self.preprocess
, self.predict
, self.postprocess
and self.transform_fn
if load_fn is not None:
self.load = load_fn
if preprocess_fn is not None:
self.preprocess = preprocess_fn
if predict_fn is not None:
self.predict = predict_fn
if postprocess_fn is not None:
self.postprocess = postprocess_fn
if transform_fn is not None:
self.transform_fn = transform_fn
In this PR, we introduce additional parameter context
for default handlers. However, for existing user scripts, they don't have this parameter when they implement customized handler function. We have to be careful when we call these functions. That's why @sachanub is adding a helper function to determine when to call functions with the new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion on how we can avoid running all request through this run_handler_function
.
Do we know what "overhead" in terms of latency gets added by running a request through the run_handler_function
?
def run_handler_function(self, func, *argv): | ||
"""Helper to call the handler function which covers 2 cases: | ||
1. the handle function takes context | ||
2. the handle function does not take context | ||
""" | ||
num_func_input = len(signature(func).parameters) | ||
if num_func_input == len(argv): | ||
# function does not take context | ||
result = func(*argv) | ||
elif num_func_input == len(argv) + 1: | ||
# function takes context | ||
argv_context = argv + (self.context,) | ||
result = func(*argv_context) | ||
else: | ||
raise TypeError( | ||
"{} takes {} arguments but {} were given.".format(func.__name__, num_func_input, len(argv)) | ||
) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something we could deprecate and remove with a v3 of the toolkit? Feels like very error prone or a big overhead to parse and add args like this for every incoming request.
What do you think of adding a check to see if there is a inference.py
provided and if not we are using self-transform
directly? Most customer deploy models using the "zero-code" deployment feature, where you provide a MODEL_ID
and TASK
and don't need and inference.py
script.
@philschmid I figured out another way to handle the context. Instead of using the
By doing this, we should be able to avoid running all inference requests through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Issue #, if available:
MMS stores request information in context object. Similar to the other inference toolkits, the context is passed to the handler service function. A PR was created for the SageMaker Inference toolkit to pass the context to the handler functions. However, since the
HuggingFaceHandlerService
class and its handler functions do not directly inherit from theDefaultHandlerService
and theDefaultInferenceHandler
classes (in the base Inference toolkit), hence the context is currently not passed to the handler functions in the HuggingFace Inference toolkit. This PR passes the context to the handler functions in the HuggingFace Inference toolkit so that users can use that information in their custom handler functions.One particular use case is multi-GPU inference for HuggingFace. The default
load
function in the handler service is capable of dynamically selecting the GPU device by leveraging the context and is capable of assigning the model to different devices. However, currently we do not pass the context as an input argument in the defaultload
,preprocess
,predict
,postprocess
and thetransform_fn
functions. As a result, customers can not directly specify the context in their custom handler functions to dynamically assign the data/model to different GPUs. The same GPU device is chosen and the data/model are assigned to that single device even on a multi-GPU host. By passing context to the handler function, users can choose to assign the data/model to different GPUs and utilize all the GPU resources available on their host.Below is an example of a
model_fn
that loads and assigns a model to a device:After this PR, handler functions with/without context will both be supported. So that it won’t break existing use cases. Example:
Description of changes:
Description of testing:
With context, the workers are allocated to different GPUs as expected:
Without context, the workers are allocated to the same GPU, burning its available memory.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.