Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with DI benchmarking #193

Closed
gdalle opened this issue Jul 3, 2024 · 23 comments
Closed

Issue with DI benchmarking #193

gdalle opened this issue Jul 3, 2024 · 23 comments
Labels
bug Something isn't working

Comments

@gdalle
Copy link

gdalle commented Jul 3, 2024

Hey there Tapir people, found this bug for you on the street, with Tapir v0.2.23

julia> using DifferentiationInterface, Tapir

julia> function paritytrig(x::AbstractVector{T}) where {T}
           y = zero(T)
           for i in eachindex(x)
               if iseven(i)
                   y += sin(x[i])
               else
                   y += cos(x[i])
               end
           end
           return y
       end
paritytrig (generic function with 1 method)

julia> gradient(paritytrig, AutoTapir(), rand(2))
[ Info: Compiling rule for Tuple{typeof(paritytrig), Vector{Float64}} in safe mode. Disable for best performance.
ERROR: MethodError: no method matching tangent_field_type(::Type{Base.OneTo{Int64}}, ::Int64)
The applicable method may be too new: running in world age 31975, while current world is 33969.

Closest candidates are:
  tangent_field_type(::Type{P}, ::Int64) where P (method too new to be called from this world context.)
   @ Tapir ~/.julia/packages/Tapir/LAYVk/src/tangents.jl:282

Stacktrace:
  [1] (::Tapir.var"#30#31"{Base.OneTo{Int64}})(n::Int64)
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/tangents.jl:246
  [2] iterate
    @ ./generator.jl:47 [inlined]
  [3] _collect
    @ ./array.jl:854 [inlined]
  [4] collect_similar(cont::UnitRange{Int64}, itr::Base.Generator{UnitRange{Int64}, Tapir.var"#30#31"{Base.OneTo{Int64}}})
    @ Base ./array.jl:763
  [5] map(f::Function, A::UnitRange{Int64})
    @ Base ./abstractarray.jl:3285
  [6] backing_type(::Type{Base.OneTo{Int64}})
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/tangents.jl:246
  [7] #s65#32
    @ ~/.julia/packages/Tapir/LAYVk/src/tangents.jl:270 [inlined]
  [8] var"#s65#32"(P::Any, ::Any, ::Any)
    @ Tapir ./none:0
  [9] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [10] fcodual_type(::Type{Base.OneTo{Int64}})
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/codual.jl:86
 [11] make_ad_stmts!(stmt::Expr, line::Tapir.ID, info::Tapir.ADInfo)
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/interpreter/s2s_reverse_mode_ad.jl:525
 [12] _broadcast_getindex_evalf
    @ ./broadcast.jl:709 [inlined]
 [13] _broadcast_getindex
    @ ./broadcast.jl:682 [inlined]
 [14] getindex
    @ ./broadcast.jl:636 [inlined]
 [15] macro expansion
    @ ./broadcast.jl:1004 [inlined]
 [16] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [17] copyto!
    @ ./broadcast.jl:1003 [inlined]
 [18] copyto!
    @ ./broadcast.jl:956 [inlined]
 [19] copy
    @ ./broadcast.jl:928 [inlined]
 [20] materialize(bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast ./broadcast.jl:903
 [21] (::Tapir.var"#205#207"{Tapir.ADInfo})(primal_blk::Tapir.BBlock)
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/interpreter/s2s_reverse_mode_ad.jl:786
 [22] iterate
    @ ./generator.jl:47 [inlined]
 [23] _collect(c::Vector{…}, itr::Base.Generator{…}, ::Base.EltypeUnknown, isz::Base.HasShape{…})
    @ Base ./array.jl:854
 [24] collect_similar(cont::Vector{Tapir.BBlock}, itr::Base.Generator{Vector{…}, Tapir.var"#205#207"{…}})
    @ Base ./array.jl:763
 [25] map
    @ ./abstractarray.jl:3285 [inlined]
 [26] build_rrule(interp::Tapir.TapirInterpreter{…}, sig::Type{…}; safety_on::Bool, silence_safety_messages::Bool)
    @ Tapir ~/.julia/packages/Tapir/LAYVk/src/interpreter/s2s_reverse_mode_ad.jl:783
 [27] build_rrule
    @ ~/.julia/packages/Tapir/LAYVk/src/interpreter/s2s_reverse_mode_ad.jl:753 [inlined]
 [28] prepare_pullback(f::typeof(paritytrig), backend::AutoTapir, x::Vector{Float64}, dy::Float64)
    @ DifferentiationInterfaceTapirExt ~/.julia/packages/DifferentiationInterface/C1Roy/ext/DifferentiationInterfaceTapirExt/onearg.jl:8
 [29] prepare_gradient(f::typeof(paritytrig), backend::AutoTapir, x::Vector{Float64})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/C1Roy/src/first_order/gradient.jl:59
 [30] gradient(f::typeof(paritytrig), backend::AutoTapir, x::Vector{Float64})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/C1Roy/src/first_order/gradient.jl:76
 [31] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.
@willtebbutt
Copy link
Member

Thanks for finding this. Will investigate.

@willtebbutt
Copy link
Member

Hmm I'm actually not able to reproduce locally. Could you provide the output of versioninfo and Pkg.status()?

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

Okay indeed in a fresh environment with just Tapir and DI on the latest stable release of everything this doesn't happen. But it did happen in the benchmark environment of SciML/SciMLBenchmarks.jl#988. Investigating

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

Apparently it has something to do with how DifferentiationInterfaceTest handles benchmarking, with deepcopy's galore. Here's an MWE that throws a different exception:

julia> using DifferentiationInterface, DifferentiationInterfaceTest

julia> import Zygote, Tapir

julia> function paritytrig(x::AbstractVector{T}) where {T}
           y = zero(T)
           for i in eachindex(x)
               if iseven(i)
                   y += sin(x[i])
               else
                   y += cos(x[i])
               end
           end
           return y
       end
paritytrig (generic function with 1 method)

julia> backends = [
           AutoTapir(safe_mode=true),
           AutoZygote(),
       ];

julia> scenarios = [
           GradientScenario(paritytrig; x=rand(2), y=0.0, nb_args=1, place=:inplace),
       ];

julia> data = benchmark_differentiation(backends, scenarios, logging=true);
[ Info: Compiling rule for Tuple{typeof(paritytrig), Vector{Float64}} in safe mode. Disable for best performance.
[ Info: Compiling rule for Tuple{typeof(paritytrig), Vector{Float64}} in safe mode. Disable for best performance.
[ Info: Compiling rule for Tuple{typeof(paritytrig), Vector{Float64}} in safe mode. Disable for best performance.
┌ Warning: Error during benchmarking
│   ba = AutoTapir()
│   scen = Scenario{:gradient,1,:inplace} paritytrig : Vector{Float64} -> Float64
│   e = deepcopy of Modules not supported
└ @ DifferentiationInterfaceTest ~/.julia/packages/DifferentiationInterfaceTest/qCkMA/src/tests/benchmark.jl:659
◐ Benchmarking    Time: 0:00:25
julia> data
6×11 DataFramee:  gradient - 1/1
 Row │ backend       scenario                           operator             calls  samples  evals  time         allocs   bytes    gc_fraction  compile_fraction 
     │ Abstract     Scenario                          Symbol               Int64  Int64    Int64  Float64      Float64  Float64  Float64      Float64          
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1AutoTapir()   Scenario{:gradient,1,:inplace} p  prepare_gradient        -1        1      0  NaN            NaN      NaN          NaN               NaN
   2AutoTapir()   Scenario{:gradient,1,:inplace} p  value_and_gradient!     -1        1      0  NaN            NaN      NaN          NaN               NaN
   3AutoTapir()   Scenario{:gradient,1,:inplace} p  gradient!               -1        1      0  NaN            NaN      NaN          NaN               NaN
   4AutoZygote()  Scenario{:gradient,1,:inplace} p  prepare_gradient         0        1      1    8.18e-7        0.0      0.0          0.0               0.0
   5AutoZygote()  Scenario{:gradient,1,:inplace} p  value_and_gradient!      1     3606      1    1.899e-5     150.0   6048.0          0.0               0.0
   6AutoZygote()  Scenario{:gradient,1,:inplace} p  gradient!                1     3610      1    1.8669e-5    144.0   5872.0          0.0               0.0

Environment:

(jl_mSTJJ6) pkg> st
Status `/tmp/jl_mSTJJ6/Project.toml`
  [a0c0ee7d] DifferentiationInterface v0.5.7
  [a82114a7] DifferentiationInterfaceTest v0.5.0
  [07d77754] Tapir v0.2.23
  [e88e6eb3] Zygote v0.6.70

julia> versioninfo()
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code

Will try to see if I can replicate the original error

@gdalle gdalle changed the title World age issue Issue with DI benchmarking Jul 4, 2024
@willtebbutt
Copy link
Member

Ah, so it looks like I can't currently deepcopy a MistyClosure. I've opened a PR to fix that.

@willtebbutt
Copy link
Member

Okay. deepcopy issue should now be resolved.

I'm now trying to figure out what's causing the world age probem.

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

Sorry that I wasn't able to reproduce it, not sure how I managed to trigger it

@willtebbutt
Copy link
Member

No problem. I'm also struggling to reliably reproduce. Are the tests running successfully on your end now, or still not working?

@willtebbutt
Copy link
Member

Okay. Something about loading DifferentiationInterfaceTest is what seems to cause problems:

using DifferentiationInterface, DifferentiationInterfaceTest#, DataFrames, DataFramesMeta
import Tapir
# import Markdown, PrettyTables, Printf

function paritytrig(x::AbstractVector{T}) where {T}
    y = zero(T)
    for i in eachindex(x)
        if iseven(i)
            y += sin(x[i])
        else
            y += cos(x[i])
        end
    end
    return y
end

extras = prepare_gradient(paritytrig, AutoTapir(safe_mode=true), randn(2));

If you don't load DifferentiationInterfaceTest then this works fine. There are some precompilation warnings I'm getting when I do load DifferentiationInterfaceTest:

Precompiling Tapir
  1 dependency successfully precompiled in 49 seconds. 123 already precompiled.
[ Info: Precompiling Tapir [07d77754-e150-4737-8c94-cd238a1fb45b]
┌ Warning: Module JET with build ID fafbfcfd-4a2e-9e0b-0000-37430629e6f6 is missing from the cache.
│ This may mean JET [c3a54625-cd67-489e-a8e7-0a5a0ff4e31b] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:1948
[ Info: Skipping precompilation since __precompile__(false). Importing Tapir [07d77754-e150-4737-8c94-cd238a1fb45b].
[ Info: Precompiling DifferentiationInterfaceTapirExt [0114fd55-92e5-5207-bbe9-7aed827b60c7]
┌ Warning: Module Tapir with build ID ffffffff-ffff-ffff-0000-375957f870c8 is missing from the cache.
│ This may mean Tapir [07d77754-e150-4737-8c94-cd238a1fb45b] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:1948
[ Info: Skipping precompilation since __precompile__(false). Importing DifferentiationInterfaceTapirExt [0114fd55-92e5-5207-bbe9-7aed827b60c7].

There are no warnings when I load DI + DITest

@willtebbutt
Copy link
Member

If it's obvious to you what's going on, please let me know. Otherwise I'm going to continue debugging.

@willtebbutt
Copy link
Member

Okay. If I remove JET from Tapir.jl's deps, the problem seems to disappear. Is there any obvious reason on your end that JET wouldn't play nicely with DifferentiationInterfaceTest.jl?

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

No, especially because JET is a dependency of DifferentiationInterfaceTest (it's used for type stability checks). On the other hand does Tapir really need it as a dependency? See #122

@willtebbutt
Copy link
Member

On the other hand does Tapir really need it as a dependency? See #122

Alas, it's pretty integral to our performance testing, and I'm not planning on splitting that testing out any time soon. We should just try to figure out what might be going on.

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

I think it might be one of those cases where @aviatesk is needed

@willtebbutt
Copy link
Member

In a somewhat concerning turn of events, if I load packages in a different order, things work:

using DifferentiationInterface, DataFrames, DataFramesMeta

import Tapir, Enzyme

using DifferentiationInterfaceTest
import Markdown, PrettyTables, Printf

function paritytrig(x::AbstractVector{T}) where {T}
    y = zero(T)
    for i in eachindex(x)
        if iseven(i)
            y += sin(x[i])
        else
            y += cos(x[i])
        end
    end
    return y
end

backends = [
    AutoEnzyme(mode=Enzyme.Reverse),
    AutoTapir(safe_mode=false),
]

scenarios = [
    GradientScenario(paritytrig; x=rand(100), y=0.0, nb_args=1, place=:inplace),
    GradientScenario(paritytrig; x=rand(10_000), y=0.0, nb_args=1, place=:inplace)
];

data = benchmark_differentiation(backends, scenarios, logging=true);

table = PrettyTables.pretty_table(
    String,
    data;
    header=names(data),
    formatters=PrettyTables.ft_printf("%.1e"),
)

Markdown.parse(table)

@willtebbutt
Copy link
Member

I feel like there probably is something bad going on on Tapir.jl's end -- it's not obvious to me that it ought to matter whether we're able to precompile properly or not.

@willtebbutt
Copy link
Member

willtebbutt commented Jul 4, 2024

A more minimal example of a failure:

using DifferentiationInterfaceTest # if you don't load this, then precompilation works fine, and this is all fine
import Tapir
Tapir.tangent_type(Base.OneTo{Int64}) # note: this is a generated function. I don't know if that's important.

@willtebbutt
Copy link
Member

Okay, so it looks like it's generated-function related, which is quite annoying. If I change the order in which certain functions are defined, then everything seems to be fine. I don't really understand why this is an issue, and I don't really know how to test that this doesn't become an issue again, or whether there are other such problems lurking around.

@gdalle what do you think we should do here?

@gdalle
Copy link
Author

gdalle commented Jul 4, 2024

I don't know, and I've never used generated functions so I'm afraid I can't help much.
What I do know is that DifferentiationInterfaceTest doesn't do anything fancy with JET, it just imports it and calls JET.@test_opt. I assume the problem would be the same if the user loads any other package that depends on JET before loading Tapir. Perhaps the problem exists when the user simply loads JET itself? In this regard, I'd say this is a Tapir problem more than a DIT problem?

@willtebbutt
Copy link
Member

willtebbutt commented Jul 4, 2024

I really have no idea.

I'd point out though that there are also precompilation issues with DIT when it's loaded after Tapir.jl, they just don't manifest in the same way. So I imagine you'll have the same precompilation issues if another package uses JET and you load up DIT. In this sense, I think we've both got problems.

Would be helpful to know if @aviatest has thoughts on what might be going on -- I'm not really sure how to debug

@gdalle
Copy link
Author

gdalle commented Jul 18, 2024

@aviatesk sorry for the ping, do you think you might be able to help here?

@willtebbutt willtebbutt added the bug Something isn't working label Jul 22, 2024
@willtebbutt
Copy link
Member

I think this is resolved by #200 , but I'm going to wait for confirmtaion from @gdalle to see.

@yebai yebai closed this as completed Jul 27, 2024
@yebai
Copy link
Contributor

yebai commented Jul 27, 2024

SciML/SciMLBenchmarks.jl#988 works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants