-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help converting and running basic DictVectorizer #1068
Comments
Did you try with something like |
Thanks for the suggestion @xadupre. I believe I tried that already, but forgot to document it in the issue. In that case, |
Following https://github.com/onnx/sklearn-onnx/blob/main/tests/test_sklearn_dict_vectorizer_converter.py#L35, is it possible to replace integer values by floats? integer might not be supported in onnxruntime. |
@xadupre it could be reasonable to replace integers with their float equivalent. I tried d = [{'foo': 1.0, 'bar': 2.0}, {'foo': 3.0, 'baz': 1.0}]
dict_type = DictionaryType(
key_type=StringTensorType([1]),
value_type=FloatTensorType([1]),
) but this still encounters the "graph is missing type information needed to construct the ORT tensor" error during d = [{'foo': 1.1, 'bar': 2.1}, {'foo': 3.1, 'baz': 1.1}] However, this still has the same issue. Any other thoughts? I also see the same error if I add a |
I want to convert the example
DictVectorizer
from the sklearn docs to ONNX. Despite looking at the documented type constraints forOnnxDictVectorizer
, all the approaches I've tried still have different errors. Can someone please advise?Here is the base script I've been modifying
And a summary of the approaches and errors. For notation, I'm showing
{key_type : value_type}
as passed to theDictionaryType
constructorApproach 1: non-tensor types
The most direct translation of the Python types in
d
should be{StringType([None, 1]) : Int64Type([None, 1])
.However
convert_sklearn()
raisesApproach 2: tensor types
As indicated by the type error, I refactored to
{StringTensorType([None, 1]) : Int64TensorType([None, 1])
.Now
sess = ...
raisesApproach 3: replace int64 with float
ONNX appears to treat the values as floats even though in Python they are ints.
To remedy, I tried
{StringTensorType([None, 1]) : FloatTensorType([None, 1])
Now
sess.run()
raisesI also see the same approach/error patterns if I wrap
DictVectorizer
in aPipeline
. Given thatmap(string, int64)
is a supported type, I'm unsure what else to tryThe text was updated successfully, but these errors were encountered: