Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack cache and acceleration (rebased) #785

Merged
merged 12 commits into from
Jun 14, 2022

Conversation

ablaom
Copy link
Member

@ablaom ablaom commented Jun 9, 2022

Replaces #767

@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2022

Codecov Report

Merging #785 (d6cf7a1) into dev (adb341f) will increase coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head d6cf7a1 differs from pull request most recent head e02f6af. Consider uploading reports for the commit e02f6af to get more accurate results

@@            Coverage Diff             @@
##              dev     #785      +/-   ##
==========================================
+ Coverage   85.93%   85.97%   +0.04%     
==========================================
  Files          36       36              
  Lines        3462     3473      +11     
==========================================
+ Hits         2975     2986      +11     
  Misses        487      487              
Impacted Files Coverage Δ
src/composition/learning_networks/machines.jl 91.95% <100.00%> (ø)
src/composition/learning_networks/nodes.jl 71.24% <100.00%> (+1.37%) ⬆️
src/composition/models/stacking.jl 94.66% <100.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adb341f...e02f6af. Read the comment docs.

@ablaom
Copy link
Member Author

ablaom commented Jun 9, 2022

@olivierlabayle I have done some integration tests for this PR. To reproduce run this script. Each model below is inserted as one of three base models in a stack, the stack is evaluated in CPU1() mode (:stack_evaluation test) and if the base model makes consistent predictions on retraining (no global RNG, and so forth) the evaluation in CPUThreads() mode is compared to the CPU1() result (:accelerated_stack_evaluation test). Non-julia models had to be excluded (see #783). Summarizing the results (all successful) below. A - indicates test was not carried because it was not applicable for that model, or because repeatability could not be established.

julia> DataFrame(report1)[:,[:name, :package_name, :stack_evaluation, :accelerated_stack_evaluation]]
22×4 DataFrame
 Row │ name                            package_name           stack_evaluation  accelerated_stack_evaluation 
     │ String                          String                 String            String                       
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ ConstantRegressor               MLJModels              -                 -
   2 │ DecisionTreeRegressor           BetaML                 ✓                 ✓
   3 │ DecisionTreeRegressor           DecisionTree           ✓                 ✓
   4 │ DeterministicConstantRegressor  MLJModels              ✓                 ✓
   5 │ ElasticNetRegressor             MLJLinearModels        ✓                 ✓
   6 │ EvoTreeGaussian                 EvoTrees               -                 -
   7 │ EvoTreeRegressor                EvoTrees               ✓                 ✓
   8 │ HuberRegressor                  MLJLinearModels        ✓                 ✓
   9 │ KNNRegressor                    NearestNeighborModels  ✓                 ✓
  10 │ LADRegressor                    MLJLinearModels        ✓                 ✓
  11 │ LGBMRegressor                   LightGBM               ✓                 ✓
  12 │ LassoRegressor                  MLJLinearModels        ✓                 ✓
  13 │ LinearRegressor                 GLM                    -                 -
  14 │ LinearRegressor                 MLJLinearModels        ✓                 ✓
  15 │ LinearRegressor                 MultivariateStats      ✓                 ✓
  16 │ NeuralNetworkRegressor          MLJFlux                ✓                 -
  17 │ QuantileRegressor               MLJLinearModels        ✓                 ✓
  18 │ RandomForestRegressor           BetaML                 ✓                 -
  19 │ RandomForestRegressor           DecisionTree           ✓                 -
  20 │ RidgeRegressor                  MLJLinearModels        ✓                 ✓
  21 │ RidgeRegressor                  MultivariateStats      ✓                 ✓
  22 │ RobustRegressor                 MLJLinearModels        ✓                 ✓

julia> DataFrame(report2)[:,[:name, :package_name, :stack_evaluation, :accelerated_stack_evaluation]]
16×4 DataFrame
 Row │ name                             package_name              stack_evaluation  accelerated_stack_evaluation 
     │ String                           String                    String            String                       
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ AdaBoostStumpClassifier          DecisionTree              ✓                 ✓
   2 │ ConstantClassifier               MLJModels                 ✓                 ✓
   3 │ DSADDetector                     OutlierDetectionNetworks  -                 -
   4 │ DecisionTreeClassifier           DecisionTree              ✓                 -
   5 │ DeterministicConstantClassifier  MLJModels                 -                 -
   6 │ ESADDetector                     OutlierDetectionNetworks  -                 -
   7 │ EvoTreeClassifier                EvoTrees                  ✓                 ✓
   8 │ GaussianNBClassifier             NaiveBayes                ✓                 ✓
   9 │ KNNClassifier                    NearestNeighborModels     ✓                 ✓
  10 │ LGBMClassifier                   LightGBM                  ✓                 ✓
  11 │ LinearBinaryClassifier           GLM                       ✓                 ✓
  12 │ LogisticClassifier               MLJLinearModels           ✓                 ✓
  13 │ NeuralNetworkClassifier          MLJFlux                   ✓                 -
  14 │ PegasosClassifier                BetaML                    ✓                 ✓
  15 │ PerceptronClassifier             BetaML                    ✓                 ✓
  16 │ RandomForestClassifier           DecisionTree              ✓                 -

@ablaom
Copy link
Member Author

ablaom commented Jun 10, 2022

To test nested threading (to rule out this as the source of #783) I have now replaced the simple stack in my integration tests, with a "double" stack, in which the model occurs at two levels (snippet below details this). All tests pass except one fail. But I strongly suggest the source of is randomness in the model (DecisionTreeRegressor()) which my filters are not picking up. (filters fixed)

function _stack(model, resource, isregressor)
    if isregressor
        models = (knn1=KNNRegressor(K=4),
                  knn2=KNNRegressor(K=6),
                  tmodel=model)
        metalearner = KNNRegressor()
    else
        models = (knn1=KNNClassifier(K=4),
                  knn2=KNNClassifier(K=6),
                  tmodel=model)
        metalearner = KNNClassifier()
    end
    Stack(;
        metalearner,
        resampling=CV(;nfolds=2),
        acceleration=resource,
        models...
    )
end

# return a nested stack in which `model` appears at two levels, with
# both layers accelerated using `resource`:
_double_stack(model, resource, isregressor) =
    _stack(_stack(model, resource, isregressor), resource, isregressor)

I'd like to recommend this PR for merge. @OkonSamuel Are you happy for us to proceed? I think #783 is unrelated.

@ablaom
Copy link
Member Author

ablaom commented Jun 14, 2022

Going to merge this now. Thanks @olivierlabayle for your contribution.

@ablaom ablaom merged commit 1b17082 into dev Jun 14, 2022
@ablaom ablaom deleted the stack_cache_and_acceleration_rebased branch June 14, 2022 01:36
This was referenced Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants