Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor test pool to be less optional #75

Merged
merged 16 commits into from
Aug 18, 2023
Merged

Refactor test pool to be less optional #75

merged 16 commits into from
Aug 18, 2023

Conversation

mmcgr
Copy link
Contributor

@mmcgr mmcgr commented Aug 11, 2023

The 'optional' test pool was over complicated and offered little advantage over using it directly. Functionality is almost unchanged after this refactor, but with a simpler interface.

  • set and get methods for test engine names now go directly to the engine pool.
  • Increasing the size of the engine pool also provisions the engines.
  • Getting an engine from the pool now assumes it exists.
  • The resize function grew a bit much so has been split into helpers.

@mmcgr mmcgr marked this pull request as ready for review August 11, 2023 08:42
@mmcgr mmcgr requested a review from mcmcgrath13 August 11, 2023 08:50
src/engines.jl Outdated Show resolved Hide resolved
src/engines.jl Outdated

Create an XS engine with default settings and the provided name.

If the engine already exists, return immediately. If not, create the engine then
return once the provisioning process is complete, or failed.
"""
function create_default_engine(name::String)
size = "XS"
function create_default_engine(name::String, size::String="XS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case there are other attributes we want to add later, can we make size a kwarg instead of a positional optional arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The name isn't so great now, since there are no defaults. create_engine_and_wait, maybe?

src/engines.jl Outdated Show resolved Hide resolved
src/testpool.jl Outdated Show resolved Hide resolved
src/testpool.jl Outdated Show resolved Hide resolved
src/engines.jl Outdated Show resolved Hide resolved
"""
function add_test_engine!(name::String)
@lock TEST_SERVER_LOCK begin
engines = TEST_ENGINE_POOL.engines
engines[name] = 0
end

# Provision the engine if it does not already exist.
TEST_ENGINE_POOL.creater(new_name)

return nothing
end

function get_next_engine_name(id::Int64)
return "julia-sdk-test-$(id)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but this default name could probably use another pass

src/testpool.jl Outdated
end
# The engine exists, but is not provisioned despite our best attempts
catch
# The engine does not exist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific error we can test for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 means the engine does not exist. There are plans for a 429 with a Retry-After, although that shouldn't be handled here. Then there are HTTP connection errors that are not about the engine.

Reporting the error, whatever it is, could be useful. 404 is a definitive error, but handling that would be covered by reporting the error. The response, if there is one, could also be reported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it sounds like we do expect those to be HTTPError and not some other error? Maybe check for that just to avoid a situation where we make a typo in the try block and swallow it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Only HTTPErrors make sense for us. If we do get a different error, something is seriously wrong and we should rethrow.

src/testpool.jl Outdated
end

@lock TEST_SERVER_LOCK begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the concurrency model correctly (which is a big if to be fair), the main place where we could see data update races is when marking engines as available for testing or not. It looks like all of the pool maintenance isn't really threaded. If this is indeed the case, could we use the lock just on those availability updates or use an atomic there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pool maintenance can be performed while testing is in progress. This is mainly useful for replacing engines while tests are being run (which we've never actually done as they've never failed during testing). The single global lock for adding/removing/claiming engines is the simplest, most reliable approach to allow this.

If we're not modifying the list during testing (and we aren't in current use) then we could add atomics, and drop the add/remove lock entirely, marking those methods as non-threadsafe.

So, yes we could change it. Is there an advantage in reliability or readability either way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge one, but it seems like there are some opportunities for helpers that could be simplified without the lock or would require smaller chunks of code locking. For instance, we delete engines in a similar try/catch + warn pattern in a bunch of places that could maybe be abstracted a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying engines mid-testing didn't prove to be needed, so simplifying the threading model sounds reasonable to me. I'll do this in a followup PR so it's easier to see the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found potential issues so I went ahead and made the changes in this PR. More work is now done by helpers, and the locks are held for less of the code.

src/testpool.jl Outdated Show resolved Hide resolved
@@ -18,7 +18,7 @@ Base.@kwdef mutable struct TestEnginePool
engines::Dict{String, Int64} = Dict()
# This is used to enable unique, simple, naming of engines
# Switching to randomly generated UUIDs would be needed if tests are run independently
next_id::Int64 = 0
next_id::Threads.Atomic{Int} = Threads.Atomic{Int}(0)
# Number of tests per engine. Values > 1 invalidate test timing, and require careful
# attention to engine sizing
concurrency::Int64 = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should concurrency also be an atomic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, concurrency is the general limit, not the number of tests actually running

function _get_new_id()
return Threads.atomic_add!(TEST_ENGINE_POOL.next_id, 1)
end

function get_free_test_engine_name()::String
delay = 1
# One lock guards name acquisition, forming a queue
# The second lock guards modification of the engine pool
@lock TEST_SERVER_ACQUISITION_LOCK while true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with next id being an atomic, is the TEST_SERVER_ACQUISITION_LOCK still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_id is for engine naming, so they don't interact. It's there to simplify starting up/naming/renaming multiple engines at once.

TEST_SERVER_ACQUISITION_LOCK is there to restrict engine claiming to one test a time - which forms a queue once all engines are claimed. The inner lock then gives 'privileged' access to anything else trying to modify the list. A standard test has to claim both locks, restricting access to the inner lock to one test thread at a time. Thus an attempt to replace an engine has to contend for the lock with only one thread that spends most of its time sleeping. Otherwise attempts to modify the list during testing could wait until all the tests had finished.

Disallowing modification of the list while tests are running would also work, of course.

Copy link
Contributor

@mcmcgrath13 mcmcgrath13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! much more readable now 🎉

src/engines.jl Outdated Show resolved Hide resolved
@mmcgr mmcgr merged commit 6105e57 into master Aug 18, 2023
2 checks passed
@mmcgr mmcgr deleted the mm-engines branch August 18, 2023 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants