Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service] Add a reusable pool for backend services. #583

Open
wants to merge 41 commits into
base: development
Choose a base branch
from

Conversation

ChrisCummins
Copy link
Contributor

@ChrisCummins ChrisCummins commented Feb 23, 2022

tldr; Reuse backend services rather than throwing them away, reducing the cost of gym.make(..) from 109ms-662ms to 0.4ms-1.4ms (84x to >1000x speedups).

Background

CompilerGym environments use a client/service architecture, where the client is a gym.Env object and the service is a subproces that implements the CompilationSession gRPC interface. At the moment, a new service is created every time an environment is created using gym.make(...), and destroyed when the environment is closed. Starting a service is an expensive process (approx 112ms for LLVM and 600ms for loop_tool and GCC), and services can be re-used and shared between clients. There are even improvements to doing so as it means that caches can be shared between clients.

Overview

This PR adds a ServiceConnectionPool class that implements a thread-safe pool for compiler service connections. This enables compiler service connections to be reused, avoiding the expensive initialization of a new service.

There is a global instance of this class, available via the static ServiceConnectionPool.get() method. To use the pool, acquire a reference to the global instance, and call the ServiceConnectionPool.acquire() method to construct and return service connections:

>>> pool = ServiceConnectionPool.get()
>>> with pool.acquire(Path("/path/to/service"), ConnectionOpts()) as service:
...    # Do something with the service.

When a service is closed (by calling service.close()), it is automatically released back to the pool so that a future request for the same type of service will reuse the connection.

Performance

By amortizing the cost of service initialization, the cost of gym.make(...) is just that of initializing the relevant CompilerEnv class. For "dummy" C++ and Python environments, the speedup is 310x and >1000x, respectively. For the LLVM environment, which has an expensive class initialization, the speedup is 84.4x.

Detailed microbenchmark results, compared to development branch:

------------------------------------------------------- benchmark 'test_make_env[dummy-cc]': 2 tests ------------------------------------------------------
Name (time in ms)                         Min              Median                 Max                Mean            StdDev                   OPS          
-----------------------------------------------------------------------------------------------------------------------------------------------------------
test_make_env[dummy-cc] (dev)        107.1154 (336.40)   108.6546 (320.97)   115.6805 (81.63)    108.8695 (310.03)   1.2997 (11.89)        9.1853 (0.00)   
test_make_env[dummy-cc] (pr-583)       0.3184 (1.0)        0.3385 (1.0)        1.4172 (1.0)        0.3512 (1.0)      0.1093 (1.0)      2,847.7579 (1.0)    
-----------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------- benchmark 'test_make_env[dummy-py]': 2 tests ------------------------------------------------------
Name (time in ms)                         Min              Median                 Max                Mean            StdDev                   OPS          
-----------------------------------------------------------------------------------------------------------------------------------------------------------
test_make_env[dummy-py] (dev)        609.9368 (>1000.0)  661.8493 (>1000.0)  664.0083 (>1000.0)  661.3323 (>1000.0)  5.2601 (125.41)       1.5121 (0.00)   
test_make_env[dummy-py] (pr-583)       0.3381 (1.0)        0.3533 (1.0)        0.5634 (1.0)        0.3753 (1.0)      0.0419 (1.0)      2,664.3997 (1.0)    
-----------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------ benchmark 'test_make_env[llvm]': 2 tests -----------------------------------------------------
Name (time in ms)                     Min              Median                 Max                Mean            StdDev                 OPS          
-----------------------------------------------------------------------------------------------------------------------------------------------------
test_make_env[llvm] (dev)        108.6013 (88.99)    112.1703 (88.07)    153.3195 (90.20)    112.7894 (84.42)    4.4759 (33.44)      8.8661 (0.01)   
test_make_env[llvm] (pr-583)       1.2204 (1.0)        1.2736 (1.0)        1.6997 (1.0)        1.3361 (1.0)      0.1339 (1.0)      748.4634 (1.0)    
-----------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------ benchmark 'test_make_env[loop_tool]': 2 tests -------------------------------------------------------
Name (time in ms)                          Min              Median                 Max                Mean            StdDev                   OPS          
------------------------------------------------------------------------------------------------------------------------------------------------------------
test_make_env[loop_tool] (dev)        660.0815 (>1000.0)  661.9509 (>1000.0)  664.2796 (>1000.0)  661.9331 (>1000.0)  0.7219 (19.71)        1.5107 (0.00)   
test_make_env[loop_tool] (pr-583)       0.3407 (1.0)        0.3553 (1.0)        0.5992 (1.0)        0.3649 (1.0)      0.0366 (1.0)      2,740.4397 (1.0)    
------------------------------------------------------------------------------------------------------------------------------------------------------------

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 23, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2022

Codecov Report

Merging #583 (8847e93) into development (947b567) will decrease coverage by 5.93%.
The diff coverage is 92.30%.

@@               Coverage Diff               @@
##           development     #583      +/-   ##
===============================================
- Coverage        88.36%   82.42%   -5.94%     
===============================================
  Files              126      127       +1     
  Lines             7597     7739     +142     
===============================================
- Hits              6713     6379     -334     
- Misses             884     1360     +476     
Impacted Files Coverage Δ
compiler_gym/bin/service.py 75.80% <0.00%> (ø)
compiler_gym/envs/gcc/__init__.py 100.00% <ø> (ø)
compiler_gym/wrappers/commandline.py 93.65% <ø> (ø)
compiler_gym/service/connection_pool.py 92.13% <92.13%> (ø)
compiler_gym/service/connection.py 80.29% <92.15%> (+2.63%) ⬆️
compiler_gym/envs/gcc/gcc_env.py 92.22% <92.30%> (-7.78%) ⬇️
compiler_gym/envs/llvm/datasets/cbench.py 79.49% <100.00%> (+0.07%) ⬆️
compiler_gym/service/__init__.py 100.00% <100.00%> (ø)
...ompiler_gym/service/client_service_compiler_env.py 88.06% <100.00%> (-2.79%) ⬇️
compiler_gym/envs/gcc/service/gcc_service.py 0.00% <0.00%> (-97.07%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 947b567...8847e93. Read the comment docs.

@ChrisCummins ChrisCummins force-pushed the feature/service-pool branch 2 times, most recently from 8c117ee to 84c55ee Compare March 3, 2022 10:46
@ChrisCummins ChrisCummins changed the title [WIP] Add a reusable pool for backend services. [service] Add a reusable pool for backend services. Mar 3, 2022
@ChrisCummins ChrisCummins marked this pull request as ready for review March 3, 2022 13:58
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Mar 7, 2022
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Mar 7, 2022
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Mar 21, 2022
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Apr 20, 2022
@ChrisCummins
Copy link
Contributor Author

Rebased on development

Copy link
Contributor

@mostafaelhoushi mostafaelhoushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (skimmed through the code, didn't understand it all but got some idea )

ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Apr 22, 2022
@ChrisCummins ChrisCummins force-pushed the feature/service-pool branch 3 times, most recently from ff7e614 to 54a7a00 Compare April 27, 2022 19:15
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request Apr 28, 2022
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this pull request May 12, 2022
ChrisCummins and others added 25 commits May 12, 2022 11:53
This adds gcc-v0 and the loop_tool-v0 environments to the
gym.make(...) benchmark, and removes the benchmark for environment
initialization from an existing service, as that is now the default
behavior when using ServiceConnectionPools.
Add a base class for the ServiceConnectionPool that provides the same
interface but does no caching.
This removes the connection_pool target by merging it into the main
package. This is needed to fix the CMake build.
@ChrisCummins ChrisCummins added this to the v0.2.6 milestone Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants