[Bug]: Instance variable is_trained is used incorrectly #49

fink-stanislav · 2024-02-25T21:49:26Z

Module

Prediction (API)

Contact Details

Current Behavior

DualPredictor is supposed to be initialized with is_trained parameter that should be a list of booleans. There are a few issues related to this variable:

Using lists as default values is not recommended because it can lead to unwanted side effects. However, side effects may never occur if the parameter is not updated within constructor, see more here. You can use tuple instead.
The instance variable is_trained is not updated after method fit is completed:

for count, model in enumerate(self.models):
    if not self.is_trained[count]:
        model.fit(X, y, **dictargs[count])
#end of method

Instead, you may want to rewrite it like this:

for count, model in enumerate(self.models):
    if not self.is_trained[count]:
        model.fit(X, y, **dictargs[count])
        self.is_trained[count] = True
#end of method

Exception can be not thrown when it should in some cases when is_trained is a part of condition of if statement:

...
if self.train:
    if self.splitter is None:
        raise RuntimeError(
                "The splitter argument is None but train is set to "
                + "True. Please provide a correct splitter to train "
                + "the underlying model."
        )
    logger.info(f"Fitting model on fold {i+cached_len}")
    predictor.fit(X_fit, y_fit, **kwargs)  # Fit K-fold predictor

    # Make sure that predictor is already trained if train arg is False
elif self.train is False and predictor.is_trained is False:
    raise RuntimeError(
        "'train' argument is set to 'False' but model is not pre-trained"
    )

else:  # Skipping training
    logger.info("Skipping training.")
...

In elif statement predictor.is_trained is used as boolean but in fact it can be a list if predictor is an instance of DualPredictor. In this case it will be True if the list is not empty even though a model can still be not trained.

The solution suggested in point 2 is only partial. I think the best way would be to keep is_trained variable private (rename it to _is_trained) and introduce a property is_trained which will behave as instance variable but will be implemented as a method under the hood. For example for DualPredictor it can look like this:

@property
def is_trained(self) -> bool:
    return self._is_trained[0] and self._is_trained[1]

Expected Behavior

is_trained is expected to be boolean
is_trained is expected to change after models are trained

Version

v0.9

Environment

No response

Relevant log output

No response

To Reproduce

I do not have example of a code that actually fails because of that. The test named test_locally_adaptive_cp actually runs the part of the code with aforementioned elif statement.

The text was updated successfully, but these errors were encountered:

M-Mouhcine · 2024-02-27T21:33:37Z

Thanks @fink-stanislav for posting this issue and suggesting fixes.

I agree that switching to tuples would be a better option.
We already considered updating the is_trained status, but we must ensure it has no side effects elsewhere. Basically, this status serves as a double check during the conformalization process: the attribute train associated with an instance of ConformalPredictor (defined in module deel.puncc.api.conformalization) needs to be consistent with the training status of the underlying predictor. I will look further into that.
When predictor.is_trained is an iterable, the test predictor.is_trained is False is indeed inconsistent. Using an attribute getter, as you suggest, looks like a great solution 👍 !

We will keep you updated on the progress regarding these issues. Thanks again for your findings.

M-Mouhcine · 2024-03-11T16:36:29Z

Hi @fink-stanislav

We have fixed the issue that you reported. However, regarding your first point, switching to tuples would cause a break in the current API. Therefore, I will reopen this issue and consider it an enhancement for the future version 0.8.x.

Thank you for your valuable contribution, and please let us know if you have any further concerns or suggestions.

fink-stanislav added the bug Something isn't working label Feb 25, 2024

M-Mouhcine self-assigned this Feb 27, 2024

M-Mouhcine linked a pull request Mar 8, 2024 that will close this issue

fix: Fixed incorrect pretrained status check in DualPredictor. #52

Merged

M-Mouhcine closed this as completed in #52 Mar 11, 2024

M-Mouhcine reopened this Mar 11, 2024

M-Mouhcine added enhancement New feature or request and removed bug Something isn't working labels Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Instance variable is_trained is used incorrectly #49

[Bug]: Instance variable is_trained is used incorrectly #49

fink-stanislav commented Feb 25, 2024

M-Mouhcine commented Feb 27, 2024 •

edited

Loading

M-Mouhcine commented Mar 11, 2024

[Bug]: Instance variable is_trained is used incorrectly #49

[Bug]: Instance variable is_trained is used incorrectly #49

Comments

fink-stanislav commented Feb 25, 2024

Module

Contact Details

Current Behavior

Expected Behavior

Version

Environment

Relevant log output

To Reproduce

M-Mouhcine commented Feb 27, 2024 • edited Loading

M-Mouhcine commented Mar 11, 2024

M-Mouhcine commented Feb 27, 2024 •

edited

Loading