-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise and test IP2Country #1684
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
@OptIn(ExperimentalUnsignedTypes::class) | ||
private fun UIntArray.fuzzyBinarySearch(target: UInt): Int? { | ||
if (isEmpty()) return null | ||
|
||
var low = 0 | ||
var high = size - 1 | ||
|
||
while (low <= high) { | ||
val mid = (low + high) / 2 | ||
val midValue = this[mid] | ||
|
||
when { | ||
midValue == target -> return mid // Exact match found | ||
midValue < target -> low = mid + 1 // Search in the right half | ||
else -> high = mid - 1 // Search in the left half | ||
} | ||
} | ||
|
||
// If no exact match, return the largest index with value <= target | ||
return if (high >= 0) high else null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the Kotlin's UIntArray.binarySearch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binarySearch
is looking for an exact match or will return -1
. This is searching for the "Price is Right" match. Best without going over... or the opposite. So we should get an answer for any given IP even if it wasn't explicitly provided in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binarySearch
doesn't simply return -1 though, it returns "where the item should have been put in order to maintain the order", in other words, it will give you info about the closest matched IP. Refer to the doc of binarySearch
here:
the index of the element, if it is contained in the array within the specified range; otherwise, the inverted insertion point (-insertion point - 1). The insertion point is defined as the index at which the element should be inserted, so that the array (or the specified subrange of array) still remains sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, amazing, thanks @SessionHero01, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! Looks like a great optimisation.
Might be worth checking the tests as something seem to have failed there.
Thanks @ThomasSession, There was some issue calling |
This PR moves
geolite2_country_blocks_ipv4.csv
out of assets and into source, and adds a gradle taskipToCode
that generates somegeolite2_country_blocks_ipv4.bin
that is placed in generated assets. This file deduplicates IP address ranges, and stores IP addresses and country codes each as ints in binary, which should be close to optimal storage usage.File size reduction appears to be about 12MB.
This PR removes the
TreeMap
and prefers primitive arrays, as the data is already sorted. This allows us to binary search over the array without additional overhead. Inserting ~100k entries into the TreeMap is anO ( n log n )
operation, as each insertion isO ( log n )
so we may as well avoid it. There's also good memory gains, not having anEntry
aleft
,right
andparent
ref for every entry.All up this improves the performance of this processing from about 3 seconds to about 1 second. The majority of that is from a database lookup in
OnionApi#paths
.This PR adds a test
IP2CountryTest.kt
that checks some of the corner cases in the ip-country translation.ipToCode
task callsipToCode.kts
a kotlin script that reads the .csv and writes the .binWhat's next?
IP2Country
, and defer work until you open thePathActivity
.OnionApi#paths