Stack cache and acceleration (rebased) #785

ablaom · 2022-06-09T20:41:12Z

Replaces #767

codecov-commenter · 2022-06-09T21:05:59Z

Codecov Report

Merging #785 (d6cf7a1) into dev (adb341f) will increase coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head d6cf7a1 differs from pull request most recent head e02f6af. Consider uploading reports for the commit e02f6af to get more accurate results

@@            Coverage Diff             @@
##              dev     #785      +/-   ##
==========================================
+ Coverage   85.93%   85.97%   +0.04%     
==========================================
  Files          36       36              
  Lines        3462     3473      +11     
==========================================
+ Hits         2975     2986      +11     
  Misses        487      487

Impacted Files	Coverage Δ
src/composition/learning_networks/machines.jl	`91.95% <100.00%> (ø)`
src/composition/learning_networks/nodes.jl	`71.24% <100.00%> (+1.37%)`	⬆️
src/composition/models/stacking.jl	`94.66% <100.00%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adb341f...e02f6af. Read the comment docs.

ablaom · 2022-06-09T21:25:22Z

@olivierlabayle I have done some integration tests for this PR. To reproduce run this script. Each model below is inserted as one of three base models in a stack, the stack is evaluated in CPU1() mode (:stack_evaluation test) and if the base model makes consistent predictions on retraining (no global RNG, and so forth) the evaluation in CPUThreads() mode is compared to the CPU1() result (:accelerated_stack_evaluation test). Non-julia models had to be excluded (see #783). Summarizing the results (all successful) below. A - indicates test was not carried because it was not applicable for that model, or because repeatability could not be established.

julia> DataFrame(report1)[:,[:name, :package_name, :stack_evaluation, :accelerated_stack_evaluation]]
22×4 DataFrame
 Row │ name                            package_name           stack_evaluation  accelerated_stack_evaluation 
     │ String                          String                 String            String                       
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ ConstantRegressor               MLJModels              -                 -
   2 │ DecisionTreeRegressor           BetaML                 ✓                 ✓
   3 │ DecisionTreeRegressor           DecisionTree           ✓                 ✓
   4 │ DeterministicConstantRegressor  MLJModels              ✓                 ✓
   5 │ ElasticNetRegressor             MLJLinearModels        ✓                 ✓
   6 │ EvoTreeGaussian                 EvoTrees               -                 -
   7 │ EvoTreeRegressor                EvoTrees               ✓                 ✓
   8 │ HuberRegressor                  MLJLinearModels        ✓                 ✓
   9 │ KNNRegressor                    NearestNeighborModels  ✓                 ✓
  10 │ LADRegressor                    MLJLinearModels        ✓                 ✓
  11 │ LGBMRegressor                   LightGBM               ✓                 ✓
  12 │ LassoRegressor                  MLJLinearModels        ✓                 ✓
  13 │ LinearRegressor                 GLM                    -                 -
  14 │ LinearRegressor                 MLJLinearModels        ✓                 ✓
  15 │ LinearRegressor                 MultivariateStats      ✓                 ✓
  16 │ NeuralNetworkRegressor          MLJFlux                ✓                 -
  17 │ QuantileRegressor               MLJLinearModels        ✓                 ✓
  18 │ RandomForestRegressor           BetaML                 ✓                 -
  19 │ RandomForestRegressor           DecisionTree           ✓                 -
  20 │ RidgeRegressor                  MLJLinearModels        ✓                 ✓
  21 │ RidgeRegressor                  MultivariateStats      ✓                 ✓
  22 │ RobustRegressor                 MLJLinearModels        ✓                 ✓

julia> DataFrame(report2)[:,[:name, :package_name, :stack_evaluation, :accelerated_stack_evaluation]]
16×4 DataFrame
 Row │ name                             package_name              stack_evaluation  accelerated_stack_evaluation 
     │ String                           String                    String            String                       
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ AdaBoostStumpClassifier          DecisionTree              ✓                 ✓
   2 │ ConstantClassifier               MLJModels                 ✓                 ✓
   3 │ DSADDetector                     OutlierDetectionNetworks  -                 -
   4 │ DecisionTreeClassifier           DecisionTree              ✓                 -
   5 │ DeterministicConstantClassifier  MLJModels                 -                 -
   6 │ ESADDetector                     OutlierDetectionNetworks  -                 -
   7 │ EvoTreeClassifier                EvoTrees                  ✓                 ✓
   8 │ GaussianNBClassifier             NaiveBayes                ✓                 ✓
   9 │ KNNClassifier                    NearestNeighborModels     ✓                 ✓
  10 │ LGBMClassifier                   LightGBM                  ✓                 ✓
  11 │ LinearBinaryClassifier           GLM                       ✓                 ✓
  12 │ LogisticClassifier               MLJLinearModels           ✓                 ✓
  13 │ NeuralNetworkClassifier          MLJFlux                   ✓                 -
  14 │ PegasosClassifier                BetaML                    ✓                 ✓
  15 │ PerceptronClassifier             BetaML                    ✓                 ✓
  16 │ RandomForestClassifier           DecisionTree              ✓                 -

ablaom · 2022-06-10T03:05:24Z

To test nested threading (to rule out this as the source of #783) I have now replaced the simple stack in my integration tests, with a "double" stack, in which the model occurs at two levels (snippet below details this). All tests pass ~~except one fail. But I strongly suggest the source of is randomness in the model (DecisionTreeRegressor()) which my filters are not picking up.~~ (filters fixed)

function _stack(model, resource, isregressor)
    if isregressor
        models = (knn1=KNNRegressor(K=4),
                  knn2=KNNRegressor(K=6),
                  tmodel=model)
        metalearner = KNNRegressor()
    else
        models = (knn1=KNNClassifier(K=4),
                  knn2=KNNClassifier(K=6),
                  tmodel=model)
        metalearner = KNNClassifier()
    end
    Stack(;
        metalearner,
        resampling=CV(;nfolds=2),
        acceleration=resource,
        models...
    )
end

# return a nested stack in which `model` appears at two levels, with
# both layers accelerated using `resource`:
_double_stack(model, resource, isregressor) =
    _stack(_stack(model, resource, isregressor), resource, isregressor)

I'd like to recommend this PR for merge. @OkonSamuel Are you happy for us to proceed? I think #783 is unrelated.

ablaom · 2022-06-14T01:36:23Z

Going to merge this now. Thanks @olivierlabayle for your contribution.

olivierlabayle and others added 7 commits June 9, 2022 16:01

add cache and acceleration to the stack fields

ae3867f

add test Project.toml and some tests

014a013

update docstrings

df274d3

remove extras sections from Project and update some docs and logs

ec41395

add some tests for fit!(::Node, acceleration=CPUThreads())

273e7f3

update propertynames

28c0212

fix Stack docstring

a01964e

ablaom mentioned this pull request Jun 9, 2022

Stack cache and acceleration #767

Closed

tweak stack docstring

c112b91

ablaom added 2 commits June 13, 2022 09:04

Merge branch 'dev' into stack_cache_and_acceleration_rebased

602e292

add DataFrames to test/Project.toml

9a0cd54

ablaom mentioned this pull request Jun 13, 2022

Add multithreading tests and tests of Stack JuliaAI/MLJTestIntegration.jl#10

Merged

1 task

ablaom added 2 commits June 14, 2022 12:57

in multithreading stack test replace stack with double stack

d6cf7a1

update docstring

e02f6af

ablaom merged commit 1b17082 into dev Jun 14, 2022

ablaom deleted the stack_cache_and_acceleration_rebased branch June 14, 2022 01:36

This was referenced Jun 14, 2022

For a 0.20.6 release #788

Merged

Issue to trigger releases #345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack cache and acceleration (rebased) #785

Stack cache and acceleration (rebased) #785

ablaom commented Jun 9, 2022

codecov-commenter commented Jun 9, 2022 •

edited

Loading

ablaom commented Jun 9, 2022

ablaom commented Jun 10, 2022 •

edited

Loading

ablaom commented Jun 14, 2022

Stack cache and acceleration (rebased) #785

Stack cache and acceleration (rebased) #785

Conversation

ablaom commented Jun 9, 2022

codecov-commenter commented Jun 9, 2022 • edited Loading

Codecov Report

ablaom commented Jun 9, 2022

ablaom commented Jun 10, 2022 • edited Loading

ablaom commented Jun 14, 2022

codecov-commenter commented Jun 9, 2022 •

edited

Loading

ablaom commented Jun 10, 2022 •

edited

Loading