-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash(::Vector{<abstract type>}, ::UInt)
allocates quite a bit due to type instability: why can't hash(::Any) infer UInt
return type?
#44244
Comments
I am not sure that helps, for example: struct S1 end
struct S2 end
struct S3 end
struct S4 end
struct S5 end
for s in [:S1, :S2, :S3, :S4, :S5]
@eval g(x::$s) = 0
end
f(x) = g(x[])
code_warntype(f, Tuple{Ref{Any}}) shows a return value of |
gotttttttttcha. Okay thanks, that makes sense! 👍 It's too bad, because we could even still a 🤔 it would be great if there was like a "dispatch-lite" where if you could know the type at the callsite, you could maybe still do dynamic dispatch, but call the |
Are the subtypes of the abstract element type known upfront? In that case, you could call out to a manually union split function which should be type stable since it doesn't include any dynamic calls. Something like https://github.com/jlapeyre/ManualDispatch.jl. |
We don't really have an optimization right now for "Do a dynamic dispatch, but all possible specializations are known to return the same type". If we did, you could insert a specialization barrier like:
And get a specialized dynamic dispatch for |
That said, it could definitely be implemented though, but it's a bit of work. |
Would something like #40880 help with this? That is, would it be possible for the compiler to do the following inference:
I may be talking out of my ass here. |
Yes, this is a case where reverse dataflow could be helpful. It's a bit hard to use the information though --- we can't literally insert a typeassert, because if Manually adding type declarations in these cases can still be useful though since it will at least speed up subsequent operations. I think it's worth doing for containers likely to be used with abstract element types, e.g. Vector and Dict. |
We do all that already (with inserted typeassert and MethodError), we just don't have the complicated version where it also manages to avoid the allocation by shifting that error around in somewhat complicated ways. |
An update here: The hash allocs saga continues. (We still continue to see UInt allocations from hash showing up at RAI a lot.) I had a clever (I think?) idea yesterday to avoid the allocation on every element by switching the dynamic dispatch to a function that doesn't have any return value, and then allowing that to have a static dispatch to function my_hash(vec::Vector, h::UInt)
h += Base.hash_abstractarray_seed
h = hash(map(first, axes(vec)), h)
h = hash(map(last, axes(vec)), h)
# Use a Ref to avoid allocations from the return value of hash():
ref = Ref{UInt}(h)
for e in vec
hash_into!(ref, e)::Nothing
end
h = ref[]::UInt
return h::UInt
end
function hash_into!(ref::Ref{UInt}, v::T) where T # (force specialization)
ref[] = my_hash(v, ref[])
return nothing
end
my_hash(a, h::UInt) = hash(a, h) It seems this helps a lot! Even in the case with very small data / a single element: julia> v = Any[i for i in 1:1_000];
julia> @btime hash($v, $(UInt(0)))
16.552 μs (1001 allocations: 15.64 KiB)
0x0e97df552a8d635e
julia> @btime my_hash($v, $(UInt(0)))
9.950 μs (1 allocation: 16 bytes)
0x0e97df552a8d635e
julia> v = Any[i for i in 1:1_000_000];
julia> @btime hash($v, $(UInt(0)))
26.136 ms (330659 allocations: 5.68 MiB)
0xb5d91dc36a23d5be
julia> @btime my_hash($v, $(UInt(0)))
10.870 ms (1 allocation: 16 bytes)
0x3eddf02e950af97c
julia> @btime hash($(Any[1]), $(UInt(0)))
28.156 ns (2 allocations: 32 bytes)
0xc7253bdde85cb9f5
julia> @btime my_hash($(Any[1]), $(UInt(0)))
20.572 ns (1 allocation: 16 bytes)
0xc7253bdde85cb9f5 (For a 1,000,000 elements, it's faster, even without the optimizations that Base.hash has for very large vectors (which is the reason for the different return value in that case).) So:
Thanks! :) |
And assuming that we cannot get rid of the Ref alloc, there is still a bit more improvement that can be made here for nested containers by adding julia> v = Any[ Any[i] for i in 1:1_000 ];
julia> @btime hash($v, $(UInt(0)))
37.920 μs (2001 allocations: 31.27 KiB)
0x5bb22512b3f4458e
julia> @btime my_hash($v, $(UInt(0)))
30.817 μs (1001 allocations: 15.64 KiB)
0x5bb22512b3f4458e julia> # Add a custom method to pass-through the ref instead of allocating a new one
function hash_into!(ref::Ref{UInt}, vec::Vector{T}) where T # (force specialization)
ref[] += Base.hash_abstractarray_seed
ref[] = hash(map(first, axes(vec)), ref[])
ref[] = hash(map(last, axes(vec)), ref[])
for e in vec
hash_into!(ref, e)::Nothing
end
return nothing
end
hash_into!
julia> @btime my_hash($v, $(UInt(0)))
23.888 μs (1 allocation: 16 bytes)
0x5bb22512b3f4458e (And we can then of course define function my_hash(vec::Vector, h::UInt)
# Use a Ref to avoid allocations from the return value of hash():
ref = Ref{UInt}(h)
hash_into!(ref, vec)
return ref[]::UInt
end This approach would basically mean one of these |
See discussion here, from JuliaCon hackathon: #50136 (comment)
|
Hi!
Here's a very small MRE that reproduces the issue I'll discuss below:
And you can see that all those allocations are UInts:
We've been using the new allocations profiler on our codebase, and we've found that on some benchmarks we are spending roughly 30% of allocations on this method instance:
hash(::Vector{RelationalAITypes.DBType}, ::UInt64)
:You can see that of these allocations, roughly 50% are allocating the UInt64s of the results of iterating the vector.
Here's the reported allocation locations:
You can see that most of the direct allocations come from the call to
hash(elt, h)
.I know that this is because there is a type instability there, since the Vector is a vector of abstract elements. But my question is: even if you don't know the type of the element, why can't julia deduce the return type to be a UInt in this case? And if it could, would that be enough to avoid allocating the return value?
I see that currently roughly 1/3 of the
hash()
methods in base cannot infer their return type:Here's a small sample of the ones that it reports with return type Any:
I'm wondering: would it be acceptable to put a return type annotation and/or return-site type assertion on all these methods asserting that they return a UInt64?
One plausible explanation might be that we don't require
hash(x, h)
to return a UInt, but I don't think that's true, because if the hash function returns anything other than a UInt, it will cause a MethodError when the returned hash value is passed along as the seed to the next call to hash()!Consider this example:
So it seems to me that there is an (implicit) requirement in the current API that Base.hash functions must return a UInt64. So my question is can we then take advantage of that requirement to provide a hint to the compiler that can eliminate the allocations in the presence of this type instability?
The text was updated successfully, but these errors were encountered: