Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile times #150

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions src/convnets/densenet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ Create a DenseNet transition sequence
- `outplanes`: number of output feature maps
"""
transition(inplanes, outplanes) =
[conv_bn((1, 1), inplanes, outplanes; bias = false, rev = true)...,
MeanPool((2, 2))]
Chain([conv_bn((1, 1), inplanes, outplanes; bias = false, rev = true)...,
MeanPool((2, 2))]...)
Comment on lines +31 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Chain([conv_bn((1, 1), inplanes, outplanes; bias = false, rev = true)...,
MeanPool((2, 2))]...)
Chain([conv_bn((1, 1), inplanes, outplanes; bias = false, rev = true)]..., MeanPool((2, 2)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type for conv_bn is already a Vector, so shouldn't just Chain(conv_bn((1, 1), inplanes, outplanes; bias = false, rev = true)..., MeanPool((2, 2))) work? Also, I know this suggestion has been shot down before because it would cause visual noise, but simply tweaking conv_bn to return a Chain does wonders for the TTFG:

master:

julia> using Metalhead

julia> using Flux: Zygote

julia> den = DenseNet();

julia> ip = rand(Float32, 224, 224, 3, 1);

julia> @time Zygote.gradient((m,x) -> sum(m(x)), den, ip);
 77.621622 seconds (124.76 M allocations: 11.324 GiB, 1.67% gc time, 97.00% compilation time)

with conv_bn returning a Chain:

julia> @time Zygote.gradient((m,x) -> sum(m(x)), den, ip);
 28.244888 seconds (89.40 M allocations: 9.049 GiB, 3.60% gc time, 90.78% compilation time)

Copy link
Member

@theabhirath theabhirath Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ This needs some tricks to get this fast though. One major trick being that large Vectors that are being splatted to give Chains....should not be (Flux 0.13 deals with this, so this works). Removing a single splat to a large vector of layers (the "body" of the DenseNet) makes it shoot back up:

julia> @time Zygote.gradient((m,x) -> sum(m(x)), den, ip);
 46.788491 seconds (117.59 M allocations: 10.873 GiB, 2.65% gc time, 94.90% compilation time)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, you are indeed right and the suggestion looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am curious about is the large discrepancy b/w first compiles on master. I regularly get ~500s TTFG with DenseNet, you don't seem to get nearly as bad times. Mine is with GPUs turned off. Does that make up some of the difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing on an M1 Mac CPU, with 4 threads and Julia master. Maybe some of the discrepancy is there? Julia 1.8+ seemed to be an order of magnitude faster than Julia 1.7 last I checked for compilation of some stuff


"""
dense_block(inplanes, growth_rates)
Expand All @@ -43,8 +43,8 @@ the number of output feature maps by `growth_rates` with each block
- `growth_rates`: the growth (additive) rates of output feature maps
after each block (a vector of `k`s from the ref)
"""
dense_block(inplanes, growth_rates) = [dense_bottleneck(i, o)
for (i, o) in zip(inplanes .+ cumsum([0, growth_rates[1:(end - 1)]...]), growth_rates)]
dense_block(inplanes, growth_rates) = Chain([dense_bottleneck(i, o)
for (i, o) in zip(inplanes .+ cumsum([0, growth_rates[1:(end - 1)]...]), growth_rates)]...)

"""
densenet(inplanes, growth_rates; reduction = 0.5, nclasses = 1000)
Expand All @@ -66,9 +66,9 @@ function densenet(inplanes, growth_rates; reduction = 0.5, nclasses = 1000)
outplanes = 0
for (i, rates) in enumerate(growth_rates)
outplanes = inplanes + sum(rates)
append!(layers, dense_block(inplanes, rates))
push!(layers, dense_block(inplanes, rates))
(i != length(growth_rates)) &&
append!(layers, transition(outplanes, floor(Int, outplanes * reduction)))
push!(layers, transition(outplanes, floor(Int, outplanes * reduction)))
inplanes = floor(Int, outplanes * reduction)
end
push!(layers, BatchNorm(outplanes, relu))
Expand Down
10 changes: 5 additions & 5 deletions src/convnets/vgg.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ function vgg_block(ifilters, ofilters, depth, batchnorm)
layers = []
for _ in 1:depth
if batchnorm
append!(layers, conv_bn(k, ifilters, ofilters; pad = p, bias = false))
push!(layers, Chain(conv_bn(k, ifilters, ofilters; pad = p, bias = false)...))
else
push!(layers, Conv(k, ifilters => ofilters, relu, pad = p))
end
ifilters = ofilters
end
return layers
return Chain(layers...)
end

"""
Expand All @@ -41,11 +41,11 @@ function vgg_convolutional_layers(config, batchnorm, inchannels)
layers = []
ifilters = inchannels
for c in config
append!(layers, vgg_block(ifilters, c..., batchnorm))
push!(layers, vgg_block(ifilters, c..., batchnorm))
push!(layers, MaxPool((2,2), stride=2))
ifilters, _ = c
end
return layers
return Chain(layers...)
end

"""
Expand All @@ -70,7 +70,7 @@ function vgg_classifier_layers(imsize, nclasses, fcsize, dropout)
push!(layers, Dropout(dropout))
push!(layers, Dense(fcsize, nclasses))

return layers
return Chain(layers...)
end

"""
Expand Down