Easy & clean function parameters validation
Decorate your function with @validate_parameters
and define validations to
each parameter.
from parameters_validation import no_whitespaces, non_blank, non_empty, non_negative, strongly_typed, validate_parameters
from my_app.auth import AuthToken
@validate_parameters
def register(
token: strongly_typed(AuthToken),
name: non_blank(str),
age: non_negative(int),
nickname: no_whitespaces(non_empty(str)),
bio: str,
):
# do register
Then at every function call parameters passed will be validated before actually executing it and raise an error or do anything else in case of custom-defined validations.
pip install parameters-validation
Creating your own validation is as easy as decorating the validation function
with @parameter_validation
:
from pandas import dataframe
from parameters_validation import parameter_validation, validate_parameters
@parameter_validation
def has_id_column(df: dataframe):
if "id" not in df:
raise ValueError("Dataframe must contain an `id` column")
@validate_parameters
def ingest(df: has_id_column(dataframe)):
# ingest
You can use a custom validation for other purposes too but keep in mind that validation functions cannot alter the actual parameter value:
import logging
from parameters_validation import parameter_validation, validate_parameters
@parameter_validation
def log_to_debug(param: str, arg_name: str):
logging.debug("{arg} = {value}".format(arg=arg_name, value=param))
@validate_parameters
def foo(df: log_to_debug(str)):
# do something
For whatever reason, if one wants to skip validations a method skip_validations
is
appended to the decorated method. When called it will return the original method as if
it wasn't decorated with @validate_parameters
:
from parameters_validation import no_whitespaces, validate_parameters
@validate_parameters
def foo(arg: no_whitespaces(str)):
print(arg)
foo.skip_validations()("white spaces")
# prints: white spaces
Note that, in the example, foo.skip_validations()
does not changes foo
itself but
actually returns another function without the validation behaviour.
In general, unit and integration tests should be fine with parameters validation validating input parameters though it might be the case one wants to mock some or all of the validations.
Beyond skipping validation, functions decorated with
@validate_parameters
are appended with a mock_validations
method that accepts a
dictionary mapping parameters to mock validations:
from parameters_validation import no_whitespaces, validate_parameters
@validate_parameters
def foo(arg: no_whitespaces(str)):
print(arg)
foo.mock_validations({"arg": lambda *_, **__: print("mocked")})("white spaces")
# prints: mocked
# prints: white spaces
Note that mock functions must not be decorated with @parameter_validation
.
Also, note that, in the example, foo.mock_validations(...)
does not changes foo
itself but actually returns another function with mocked behaviour.
When testing the decorated function itself it may suffice just to call it with
mock_validations
, otherwise one can use the returned function to patch the original
one. In this example we're patching a decorated function named something
using pytest and pytest-mock:
from project.module import foo
def test_something(mocker):
# given
arg_validation_mock = mocker.MagicMock()
mocked_something = foo.something.mock_validations({"arg": arg_validation_mock})
mocker.patch("project.module.foo.something", mocked_something)
# when
foo.something(42)
# then
arg_validation_mock.assert_called_once_with(42, "arg", None)
It is a pythonic convention follow the EAFP principle whenever possible. There are cases however that skipping validations leads to silent errors and big headaches. Let's use an illustrative example:
from pyspark.sql import DataFrame
def persist_to_s3(df: DataFrame, path: str):
df.write.parquet(path)
This code is perfectly fine but assume that there is a business requirement that all persisted dataframes contain a ts
column with the data timestamp.
Blindly following EAFP the code is left unchanged but other developers can write dataframes without the ts
column and no error will be logged. In the worst case it can lead to tons of data being saved wrong and rendered useless.
EAFP works well when code is still able to deal with exception scenarios where it will eventually break. In the example above a validation to the dataframe is appropriate:
from pyspark.sql import DataFrame
def persist_to_s3(df: DataFrame, path: str):
if "ts" not in df.columns:
raise ValueError("dataframe is missing a `ts` column")
df.write.parquet(path)
parameters-validation
package helps you being more declarative in your validations stating them right at the function's signature and avoiding polution of your function's body with validation code:
from pyspark.sql import DataFrame
from parameters_validation import parameter_validation, validate_parameters
@parameter_validation
def with_ts_column(df: DataFrame):
if "ts" not in df.columns:
raise ValueError("dataframe is missing a `ts` column")
@validate_parameters
def persist_to_s3(df: with_ts_column(DataFrame), path: str):
df.write.parquet(path)