Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Instance variable is_trained is used incorrectly #49

Open
fink-stanislav opened this issue Feb 25, 2024 · 2 comments · Fixed by #52
Open

[Bug]: Instance variable is_trained is used incorrectly #49

fink-stanislav opened this issue Feb 25, 2024 · 2 comments · Fixed by #52
Assignees
Labels
enhancement New feature or request

Comments

@fink-stanislav
Copy link

Module

Prediction (API)

Contact Details

[email protected]

Current Behavior

DualPredictor is supposed to be initialized with is_trained parameter that should be a list of booleans. There are a few issues related to this variable:

  1. Using lists as default values is not recommended because it can lead to unwanted side effects. However, side effects may never occur if the parameter is not updated within constructor, see more here. You can use tuple instead.
  2. The instance variable is_trained is not updated after method fit is completed:
for count, model in enumerate(self.models):
    if not self.is_trained[count]:
        model.fit(X, y, **dictargs[count])
#end of method

Instead, you may want to rewrite it like this:

for count, model in enumerate(self.models):
    if not self.is_trained[count]:
        model.fit(X, y, **dictargs[count])
        self.is_trained[count] = True
#end of method
  1. Exception can be not thrown when it should in some cases when is_trained is a part of condition of if statement:
...
if self.train:
    if self.splitter is None:
        raise RuntimeError(
                "The splitter argument is None but train is set to "
                + "True. Please provide a correct splitter to train "
                + "the underlying model."
        )
    logger.info(f"Fitting model on fold {i+cached_len}")
    predictor.fit(X_fit, y_fit, **kwargs)  # Fit K-fold predictor

    # Make sure that predictor is already trained if train arg is False
elif self.train is False and predictor.is_trained is False:
    raise RuntimeError(
        "'train' argument is set to 'False' but model is not pre-trained"
    )

else:  # Skipping training
    logger.info("Skipping training.")
...

In elif statement predictor.is_trained is used as boolean but in fact it can be a list if predictor is an instance of DualPredictor. In this case it will be True if the list is not empty even though a model can still be not trained.

The solution suggested in point 2 is only partial. I think the best way would be to keep is_trained variable private (rename it to _is_trained) and introduce a property is_trained which will behave as instance variable but will be implemented as a method under the hood. For example for DualPredictor it can look like this:

@property
def is_trained(self) -> bool:
    return self._is_trained[0] and self._is_trained[1]

Expected Behavior

is_trained is expected to be boolean
is_trained is expected to change after models are trained

Version

v0.9

Environment

No response

Relevant log output

No response

To Reproduce

I do not have example of a code that actually fails because of that. The test named test_locally_adaptive_cp actually runs the part of the code with aforementioned elif statement.

@fink-stanislav fink-stanislav added the bug Something isn't working label Feb 25, 2024
@M-Mouhcine M-Mouhcine self-assigned this Feb 27, 2024
@M-Mouhcine
Copy link
Collaborator

M-Mouhcine commented Feb 27, 2024

Thanks @fink-stanislav for posting this issue and suggesting fixes.

  1. I agree that switching to tuples would be a better option.
  2. We already considered updating the is_trained status, but we must ensure it has no side effects elsewhere. Basically, this status serves as a double check during the conformalization process: the attribute train associated with an instance of ConformalPredictor (defined in module deel.puncc.api.conformalization) needs to be consistent with the training status of the underlying predictor. I will look further into that.
  3. When predictor.is_trained is an iterable, the test predictor.is_trained is False is indeed inconsistent. Using an attribute getter, as you suggest, looks like a great solution 👍 !

We will keep you updated on the progress regarding these issues. Thanks again for your findings.

@M-Mouhcine
Copy link
Collaborator

Hi @fink-stanislav

We have fixed the issue that you reported. However, regarding your first point, switching to tuples would cause a break in the current API. Therefore, I will reopen this issue and consider it an enhancement for the future version 0.8.x.

Thank you for your valuable contribution, and please let us know if you have any further concerns or suggestions.

@M-Mouhcine M-Mouhcine reopened this Mar 11, 2024
@M-Mouhcine M-Mouhcine added enhancement New feature or request and removed bug Something isn't working labels Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants