-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor test pool to be less optional #75
Conversation
src/engines.jl
Outdated
|
||
Create an XS engine with default settings and the provided name. | ||
|
||
If the engine already exists, return immediately. If not, create the engine then | ||
return once the provisioning process is complete, or failed. | ||
""" | ||
function create_default_engine(name::String) | ||
size = "XS" | ||
function create_default_engine(name::String, size::String="XS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case there are other attributes we want to add later, can we make size
a kwarg instead of a positional optional arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The name isn't so great now, since there are no defaults. create_engine_and_wait
, maybe?
""" | ||
function add_test_engine!(name::String) | ||
@lock TEST_SERVER_LOCK begin | ||
engines = TEST_ENGINE_POOL.engines | ||
engines[name] = 0 | ||
end | ||
|
||
# Provision the engine if it does not already exist. | ||
TEST_ENGINE_POOL.creater(new_name) | ||
|
||
return nothing | ||
end | ||
|
||
function get_next_engine_name(id::Int64) | ||
return "julia-sdk-test-$(id)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but this default name could probably use another pass
src/testpool.jl
Outdated
end | ||
# The engine exists, but is not provisioned despite our best attempts | ||
catch | ||
# The engine does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific error we can test for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
404 means the engine does not exist. There are plans for a 429 with a Retry-After, although that shouldn't be handled here. Then there are HTTP connection errors that are not about the engine.
Reporting the error, whatever it is, could be useful. 404 is a definitive error, but handling that would be covered by reporting the error. The response, if there is one, could also be reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it sounds like we do expect those to be HTTPError
and not some other error? Maybe check for that just to avoid a situation where we make a typo in the try block and swallow it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Only HTTPError
s make sense for us. If we do get a different error, something is seriously wrong and we should rethrow.
src/testpool.jl
Outdated
end | ||
|
||
@lock TEST_SERVER_LOCK begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the concurrency model correctly (which is a big if to be fair), the main place where we could see data update races is when marking engines as available for testing or not. It looks like all of the pool maintenance isn't really threaded. If this is indeed the case, could we use the lock just on those availability updates or use an atomic there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pool maintenance can be performed while testing is in progress. This is mainly useful for replacing engines while tests are being run (which we've never actually done as they've never failed during testing). The single global lock for adding/removing/claiming engines is the simplest, most reliable approach to allow this.
If we're not modifying the list during testing (and we aren't in current use) then we could add atomics, and drop the add/remove lock entirely, marking those methods as non-threadsafe.
So, yes we could change it. Is there an advantage in reliability or readability either way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge one, but it seems like there are some opportunities for helpers that could be simplified without the lock or would require smaller chunks of code locking. For instance, we delete engines in a similar try/catch + warn pattern in a bunch of places that could maybe be abstracted a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying engines mid-testing didn't prove to be needed, so simplifying the threading model sounds reasonable to me. I'll do this in a followup PR so it's easier to see the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found potential issues so I went ahead and made the changes in this PR. More work is now done by helpers, and the locks are held for less of the code.
@@ -18,7 +18,7 @@ Base.@kwdef mutable struct TestEnginePool | |||
engines::Dict{String, Int64} = Dict() | |||
# This is used to enable unique, simple, naming of engines | |||
# Switching to randomly generated UUIDs would be needed if tests are run independently | |||
next_id::Int64 = 0 | |||
next_id::Threads.Atomic{Int} = Threads.Atomic{Int}(0) | |||
# Number of tests per engine. Values > 1 invalidate test timing, and require careful | |||
# attention to engine sizing | |||
concurrency::Int64 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should concurrency also be an atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, concurrency is the general limit, not the number of tests actually running
function _get_new_id() | ||
return Threads.atomic_add!(TEST_ENGINE_POOL.next_id, 1) | ||
end | ||
|
||
function get_free_test_engine_name()::String | ||
delay = 1 | ||
# One lock guards name acquisition, forming a queue | ||
# The second lock guards modification of the engine pool | ||
@lock TEST_SERVER_ACQUISITION_LOCK while true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with next id being an atomic, is the TEST_SERVER_ACQUISITION_LOCK
still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_id
is for engine naming, so they don't interact. It's there to simplify starting up/naming/renaming multiple engines at once.
TEST_SERVER_ACQUISITION_LOCK
is there to restrict engine claiming to one test a time - which forms a queue once all engines are claimed. The inner lock then gives 'privileged' access to anything else trying to modify the list. A standard test has to claim both locks, restricting access to the inner lock to one test thread at a time. Thus an attempt to replace an engine has to contend for the lock with only one thread that spends most of its time sleeping. Otherwise attempts to modify the list during testing could wait until all the tests had finished.
Disallowing modification of the list while tests are running would also work, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! much more readable now 🎉
The 'optional' test pool was over complicated and offered little advantage over using it directly. Functionality is almost unchanged after this refactor, but with a simpler interface.