Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reinterpret to Int64 can increase throughput #1

Open
Moelf opened this issue May 20, 2024 · 3 comments
Open

reinterpret to Int64 can increase throughput #1

Moelf opened this issue May 20, 2024 · 3 comments

Comments

@Moelf
Copy link

Moelf commented May 20, 2024

julia> hamming_distance(x1::T, x2::T) where T<:Integer = count_ones(x1  x2)

julia> function hamming_distance(x1, x2)
           s = 0
           @inbounds @simd for i in eachindex(x1, x2)
               s += hamming_distance(x1[i], x2[i])
           end
           s
       end

julia> a = rand(Int8, 1024); b = rand(Int8, 1024);

julia> @be hamming_distance($a,$b)
Benchmark: 2973 samples with 263 evaluations
min    110.281 ns
median 112.148 ns
mean   112.414 ns
max    171.270 ns

julia> @be hamming_distance(reinterpret(UInt64, $a), reinterpret(UInt64, $b))
Benchmark: 3055 samples with 803 evaluations
min    35.022 ns
median 36.007 ns
mean   36.216 ns
max    48.721 ns

btw, there are a lot of unnecessary annotations, for example, count_ones always return Int no matter what already:

julia> Base.url(@which count_ones(0xf2))
"https://github.com/JuliaLang/julia/tree/0b4590a5507d3f3046e5bafc007cacbbfc9b310b/base/int.jl#L415"

@domluna
Copy link
Owner

domluna commented May 20, 2024

does this translate to a speedup in the search function as well? Some things I've done which speedup the hamming_distance function don't seem to make a difference when doing the actual search

@domluna
Copy link
Owner

domluna commented May 20, 2024

julia> q1 = SVector{8, UInt64}(rand(UInt64,8));

julia> q2 = SVector{8, UInt64}(rand(UInt64,8));

julia> @be hamming_distance($q1, $q2)
Benchmark: 5613 samples with 5887 evaluations
min    2.484 ns
median 2.605 ns
mean   2.841 ns
max    7.899 ns
julia> @code_llvm hamming_distance(q1, q2)
; Function Signature: hamming_distance(StaticArraysCore.SArray{Tuple{8}, UInt64, 1, 8}, StaticArraysCore.SArray{Tuple{8}, UInt64, 1, 8})
;  @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:28 within `hamming_distance`
define i64 @julia_hamming_distance_12257(ptr nocapture noundef nonnull readonly align 8 dereferenceable(64) %"x1::SArray", ptr nocapture noundef nonnull readonly align 8 dereferenceable(64) %"x2::SArray") #0 {
top:
;  @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:33 within `hamming_distance`
; ┌ @ simdloop.jl:77 within `macro expansion` @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:34
; │┌ @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance`
; ││┌ @ int.jl:373 within `xor`
     %0 = load <8 x i64>, ptr %"x1::SArray", align 8
     %1 = load <8 x i64>, ptr %"x2::SArray", align 8
     %2 = xor <8 x i64> %1, %0
; ││└
; ││┌ @ int.jl:415 within `count_ones`
     %3 = call <8 x i64> @llvm.ctpop.v8i64(<8 x i64> %2)
; │└└
; │┌ @ int.jl:87 within `+`
    %4 = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> %3)
; └└
;  @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:36 within `hamming_distance`
  ret i64 %4
}

It's not going to get better than this.

even if each comparison was 1ns the best over 1M rows would be 1ms, which it's already very close to

julia> @be k_closest($X1, $q1, 100)
Benchmark: 28 samples with 1 evaluation
min    3.207 ms (5 allocs: 3.281 KiB)
median 3.433 ms (5 allocs: 3.281 KiB)
mean   3.562 ms (5 allocs: 3.281 KiB)
max    4.749 ms (5 allocs: 3.281 KiB)

julia> a = rand(UInt64, 8)
8-element Vector{UInt64}:
 0xb08159b400b994e8
 0x719aafe1251df126
 0x4be73d16ff742a65
 0x9af6c257b0e17a28
 0x5c8128573f810275
 0x6cb73fed5ee64e79
 0x57b7c96fb9093e66
 0x55d6863d8cd7de69

julia> @be k_closest($X1, $a, 100)
Benchmark: 21 samples with 1 evaluation
min    4.425 ms (5 allocs: 3.281 KiB)
median 4.511 ms (5 allocs: 3.281 KiB)
mean   4.639 ms (5 allocs: 3.281 KiB)
max    5.490 ms (5 allocs: 3.281 KiB)

the best combo seems to be a StaticArray query vector and a matrix as the DB

@domluna
Copy link
Owner

domluna commented May 20, 2024

julia> @be k_closest_parallel($X1, $q1, 100)
Benchmark: 83 samples with 1 evaluation
min    1.099 ms (58 allocs: 23.906 KiB)
median 1.117 ms (58 allocs: 23.906 KiB)
mean   1.180 ms (58 allocs: 23.906 KiB)
max    2.570 ms (58 allocs: 23.906 KiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants