Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Use SparkConf or RuntimeConf for the se_conf. #46

Open
newfront opened this issue Oct 7, 2023 · 1 comment
Open

[FEATURE] Use SparkConf or RuntimeConf for the se_conf. #46

newfront opened this issue Oct 7, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@newfront
Copy link

newfront commented Oct 7, 2023

Describe the solution you'd like

Rather than having a Dict[str, Union[str, bool, int]], as shown below.

se_conf = {
    "se_notifications_enable_email": False,
    "se_notifications_email_smtp_host": "mailhost.example.com",
    "se_notifications_email_smtp_port": 25,
    "se_notifications_email_from": "[email protected]",
    "se_notifications_email_subject": "spark expectations - data quality - notifications",
    "se_notifications_on_fail": True,
    "se_notifications_on_error_drop_exceeds_threshold_breach": True,
    "se_notifications_on_error_drop_threshold": 15,
}

What if we provided the ability to configure the expectations directly with the SparkSession configuration?

The following would be the requirement from the end user since they need to provide their own SparkSession in the first place.

spark = SparkSession.getActiveSession()

spark.conf.set("se.notifications.email.enabled", "false")
spark.conf.set("se.notifications.email.smtp.host",  "mailhost.example.com")
spark.conf.set("se.notifications.email.smtp.port", "25")
spark.conf.set("se.notifications.email_from", "[email protected]")
spark.conf.set("se.notifications.email_subject", "spark expectations - data quality - notifications")
spark.conf.set("se.notifications.on_fail", "true")
spark.conf.set("se.notifications.on_error_drop_exceeds_threshold_breach", "true")
spark.conf.set("se.notifications.on_error_drop_threshold", "15")

Describe alternatives you've considered
The alternative is to construct a configuration dictionary (that is already completed).

Additional context
By using spark.newSession managing the configuration bound to a given instance of the SparkSession becomes easier.

Am I willing to work on this. Yes.

@newfront newfront added the enhancement New feature or request label Oct 7, 2023
@asingamaneni
Copy link
Collaborator

This will be nice to have as a standard. For the transition we can have both and deprecate the dictionary later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants