Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dict, List generic types when using check_types for runtime validation #1078

Open
2 of 3 tasks
nolanbconaway opened this issue Jan 24, 2023 · 5 comments
Open
2 of 3 tasks
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@nolanbconaway
Copy link

nolanbconaway commented Jan 24, 2023

Describe the bug

I have a function which I am checking with pa.check_types. The output type is not only a dataframe schema, but has these types nested, like Dict[str, pa.typing.DataFrame[Schema]]. SchemaError is not raised when the return type is invalid in this case; it is only raised when the schema is the return type of the function.

Are these kinds of signatures supported by pandera? Is there a supported approach to this kind of thing?

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Code Sample, a copy-pastable example

from typing import Dict

import pandas as pd
import pandera as pa


class Schema(pa.SchemaModel):
    a: pa.typing.Series[float] = pa.Field(nullable=False, coerce=True)


@pa.check_types
def raises_schema_error() -> pa.typing.DataFrame[Schema]:
    return pd.DataFrame({"a": [1, 2, None]})


@pa.check_types
def no_schema_error() -> Dict[str, pa.typing.DataFrame[Schema]]:
    return {'b': pd.DataFrame({"a": [1, 2, None]})}

Expected behavior

I was expecting schema error to be raised in both cases.

Desktop (please complete the following information):

  • OS: ubuntu
  • Browser: chrome

Screenshots

Additional context

@nolanbconaway nolanbconaway added the bug Something isn't working label Jan 24, 2023
@cosmicBboy
Copy link
Collaborator

hi @nolanbconaway, @check_types doesn't support any types other that pa.typing.DataFrame[Schema], with support added recently for Union types: #995

Would need to add support for this use case as well. Re-labelling this as an enhancement. The bug is perhaps that check_types doesn't warn the user if a particular return type is not supported.

@cosmicBboy cosmicBboy added enhancement New feature or request and removed bug Something isn't working labels Jan 24, 2023
@cosmicBboy cosmicBboy changed the title Validation skipped with schemas nested in return types Support Dict, List generic types when using check_types for runtime validation Jan 24, 2023
@nolanbconaway
Copy link
Author

Ah that makes sense! Thanks for the fast response

@cosmicBboy
Copy link
Collaborator

Would welcome a community contribution for this! Will need to extend this to pulling out a SchemaModel from a generic type that contains it.

@nolanbconaway
Copy link
Author

Hey! I gave this a couple hours and I think it goes a bit deeper than I have space to look. The Union type is a generalization of the current functionality in that the argument value is still a single dataframe. With dicts, lists, etc; pandera would need to support arbitrarily nested type hints. With that comes the possibility that multiple dataframes are provided for a single argument, and those could have different schemas.

e.g.,

def f() -> Tuple[DataFrame[s1], DataFrame[s2]]:
    ...

The decorator runs off of _check_arg, which validates a single value on the basis of a mapping between schemas and argument names. The storage of the schemas in annotated_schema_models would need to be changed to map more than the argument name because pandera needs to find the part of the hint to which the schema applies. _check_arg would need to receive information about how to traverse annotated_schema_models, and so on.

From what I can tell, some form of zipping the arguments to the annotation would be needed to handle this sanely, but that feels like a bigger architectural change than i have time to support.

I added a pr with as far as i got, in case someone with more knowledge/time thinks this is worth pursuing! #1084

@cosmicBboy
Copy link
Collaborator

thanks @nolanbconaway! yeah _check_arg needs to be modified to support potentially multiple data objects in a single arg, with multiple schemas for each data object.

won't have capacity to work on this any time soon, but I'm adding a help wanted tag to this issue if any community members are interested in implementing this!

@cosmicBboy cosmicBboy added the help wanted Extra attention is needed label Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants