Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add BalancedModel #1

Merged
merged 15 commits into from
Sep 26, 2023
Merged

✨ Add BalancedModel #1

merged 15 commits into from
Sep 26, 2023

Conversation

EssamWisam
Copy link
Collaborator

Description

This PR adds BalancedModel which allows sequential composition of resampling models from Imbalance.jl (or any transforms that operate on (X, y) data.

@ablaom could you please review the additions when you get the chance. The interface implemented is very similar to that of Stacking in MLJBase.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/MLJBalancing.jl Outdated Show resolved Hide resolved
src/BalancedModel.jl Outdated Show resolved Hide resolved
y_train, y_test = y[train_inds], y[test_inds]

# Load models and balancers
SVC = @load SVC pkg = LIBSVM
Copy link
Member

@ablaom ablaom Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to avoid the C models and use pure Julia models in testing where possible, because it leads to less maintenance hangups. A simple Julia deterministic classifier is OneRuleClassifier (You can also turn a probabilistic model into a deterministic one using pipeline syntax probabilistic_model |> (y |> mode.(y)).)

Also, keep in mind that there is a known issue which prevents using composite models (from exported learning networks) in multithreading mode for some non-Julia models: JuliaAI/MLJBase.jl#783

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test if a model in pure julia use you can do MLJBase.is_pure_julia(model).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom I don't see how probabilistic_model |> (y |> mode.(y)) would change the model type itself from Probabilistic to Deterministic, will it? OneRuleClassifier only works with categorical data. I ended up ditching the SVM and using DeterministicConstantClassifier, afterall all I want to assure here is that it works with the Deterministic type with no problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This is a good resolution.

test/BalancedModel.jl Outdated Show resolved Hide resolved
test/BalancedModel.jl Outdated Show resolved Hide resolved
Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only substantive suggestion is to avoid non-Julia models in testing.

@ablaom
Copy link
Member

ablaom commented Sep 20, 2023

I think that Julia 1.6 (LTS) and Julia 1 (latest) should be in the CI matrix, and that suffices.

@ablaom
Copy link
Member

ablaom commented Sep 20, 2023

I believe the Windows fail is due to a known Julia issue with SVMLIB and XGBoost particular to Windows (actually a Julia bug) which was corrected at about Julia 1.9.2 or 1.9.3. It appeared around Julia 1.8.0. So this should disappear when you skip 1.8 testing.

JuliaML/LIBSVM.jl#95

Another argument for avoiding non-Julia models in testing 😉

@EssamWisam
Copy link
Collaborator Author

@ablaom Ready to merge this once the final comments are addressed. Thank you so much for your time during the review.

@EssamWisam
Copy link
Collaborator Author

EssamWisam commented Sep 23, 2023

Okay. There is only one comment to address (there was another but have found the answer). note that Github checks fail only due to daily limits.

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.

@EssamWisam EssamWisam merged commit e2cbd1d into main Sep 26, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants