Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Add benchmarks to test runs #2410

Open
wants to merge 17 commits into
base: gh/vmoens/22/base
Choose a base branch
from
Open

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 2, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Sep 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2410

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 5 New Failures, 8 Unrelated Failures

As of commit deef000 with merge base e294c68 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Sep 2, 2024
ghstack-source-id: 9e63898503a8f05206cb05de91bac6346615815b
Pull Request resolved: #2410
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2024
@vmoens vmoens added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Sep 2, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 2, 2024
ghstack-source-id: e95bf0573035e381f1d4cb18b4afd2cfdbf43ee6
Pull Request resolved: #2410
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 2, 2024
ghstack-source-id: 06870204d47411f4c9f31d2a42cf402704876322
Pull Request resolved: #2410
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 2, 2024
ghstack-source-id: 740a14c3f86e4c09a9a84647995a358c049d8909
Pull Request resolved: #2410
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 4, 2024
ghstack-source-id: 4ee660eb687dd4bd39143f6a33c87da979ecaa78
Pull Request resolved: #2410
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link

github-actions bot commented Sep 17, 2024

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_single 0.1053s 0.1022s 9.7849 Ops/s
test_sync 90.5561ms 88.0732ms 11.3542 Ops/s
test_async 0.1646s 85.1766ms 11.7403 Ops/s
test_single_pixels 0.1110s 0.1099s 9.1023 Ops/s
test_sync_pixels 72.5370ms 71.4017ms 14.0053 Ops/s
test_async_pixels 0.2117s 68.3159ms 14.6379 Ops/s
test_simple 0.7251s 0.7241s 1.3809 Ops/s
test_transformed 0.9526s 0.9512s 1.0513 Ops/s
test_serial 2.1426s 2.0672s 0.4837 Ops/s
test_parallel 1.9135s 1.8732s 0.5339 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.4431ms 36.8061μs 27.1694 KOps/s
test_step_mdp_speed[True-True-True-True-False] 44.7210μs 21.1736μs 47.2287 KOps/s
test_step_mdp_speed[True-True-True-False-True] 0.3989ms 21.3485μs 46.8417 KOps/s
test_step_mdp_speed[True-True-True-False-False] 0.3937ms 12.0863μs 82.7383 KOps/s
test_step_mdp_speed[True-True-False-True-True] 0.4394ms 39.5900μs 25.2589 KOps/s
test_step_mdp_speed[True-True-False-True-False] 52.1200μs 23.0359μs 43.4106 KOps/s
test_step_mdp_speed[True-True-False-False-True] 0.4167ms 22.9483μs 43.5762 KOps/s
test_step_mdp_speed[True-True-False-False-False] 0.3923ms 14.1525μs 70.6589 KOps/s
test_step_mdp_speed[True-False-True-True-True] 0.4327ms 42.0367μs 23.7887 KOps/s
test_step_mdp_speed[True-False-True-True-False] 50.5000μs 25.4699μs 39.2620 KOps/s
test_step_mdp_speed[True-False-True-False-True] 0.4049ms 23.5045μs 42.5451 KOps/s
test_step_mdp_speed[True-False-True-False-False] 0.4035ms 14.1980μs 70.4324 KOps/s
test_step_mdp_speed[True-False-False-True-True] 79.7620μs 43.8463μs 22.8069 KOps/s
test_step_mdp_speed[True-False-False-True-False] 0.4132ms 27.2952μs 36.6364 KOps/s
test_step_mdp_speed[True-False-False-False-True] 0.4307ms 24.8146μs 40.2988 KOps/s
test_step_mdp_speed[True-False-False-False-False] 0.3964ms 15.9838μs 62.5634 KOps/s
test_step_mdp_speed[False-True-True-True-True] 0.4313ms 41.3901μs 24.1604 KOps/s
test_step_mdp_speed[False-True-True-True-False] 46.6710μs 25.2102μs 39.6665 KOps/s
test_step_mdp_speed[False-True-True-False-True] 0.4159ms 26.3790μs 37.9090 KOps/s
test_step_mdp_speed[False-True-True-False-False] 0.3967ms 15.5449μs 64.3296 KOps/s
test_step_mdp_speed[False-True-False-True-True] 0.4214ms 42.9875μs 23.2626 KOps/s
test_step_mdp_speed[False-True-False-True-False] 51.5910μs 27.1448μs 36.8395 KOps/s
test_step_mdp_speed[False-True-False-False-True] 3.5102ms 28.3603μs 35.2605 KOps/s
test_step_mdp_speed[False-True-False-False-False] 0.4031ms 17.4052μs 57.4539 KOps/s
test_step_mdp_speed[False-False-True-True-True] 0.4363ms 45.4727μs 21.9912 KOps/s
test_step_mdp_speed[False-False-True-True-False] 62.0110μs 29.5281μs 33.8660 KOps/s
test_step_mdp_speed[False-False-True-False-True] 0.4062ms 28.0272μs 35.6796 KOps/s
test_step_mdp_speed[False-False-True-False-False] 0.4049ms 17.3353μs 57.6858 KOps/s
test_step_mdp_speed[False-False-False-True-True] 0.4454ms 46.6049μs 21.4570 KOps/s
test_step_mdp_speed[False-False-False-True-False] 55.8110μs 31.2152μs 32.0357 KOps/s
test_step_mdp_speed[False-False-False-False-True] 0.4278ms 28.8567μs 34.6540 KOps/s
test_step_mdp_speed[False-False-False-False-False] 0.4005ms 19.2709μs 51.8918 KOps/s
test_values[generalized_advantage_estimate-True-True] 24.2709ms 23.6211ms 42.3351 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1077s 3.0341ms 329.5833 Ops/s
test_values[td0_return_estimate-False-False] 90.5810μs 65.4737μs 15.2733 KOps/s
test_values[td1_return_estimate-False-False] 53.7101ms 53.1221ms 18.8246 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.2422ms 1.0602ms 943.2337 Ops/s
test_values[td_lambda_return_estimate-True-False] 89.2350ms 85.4059ms 11.7088 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.2791ms 1.0617ms 941.8578 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.0420ms 23.6066ms 42.3611 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9256ms 0.6997ms 1.4291 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1.0213ms 0.6440ms 1.5528 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4996ms 1.4469ms 691.1304 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.1008ms 0.6866ms 1.4564 KOps/s
test_dqn_speed[False-None] 6.9930ms 1.2974ms 770.7835 Ops/s
test_dqn_speed[False-backward] 1.8879ms 1.8370ms 544.3643 Ops/s
test_dqn_speed[True-None] 0.6637ms 0.5626ms 1.7774 KOps/s
test_dqn_speed[True-backward] 1.1255ms 1.0038ms 996.2621 Ops/s
test_dqn_speed[reduce-overhead-None] 0.9089ms 0.5657ms 1.7677 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.0469ms 1.0036ms 996.4563 Ops/s
test_ddpg_speed[False-None] 3.2577ms 2.6866ms 372.2125 Ops/s
test_ddpg_speed[False-backward] 4.0087ms 3.9054ms 256.0563 Ops/s
test_ddpg_speed[True-None] 1.5988ms 1.2484ms 801.0208 Ops/s
test_ddpg_speed[True-backward] 2.2754ms 2.2185ms 450.7569 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.6164ms 1.2567ms 795.7188 Ops/s
test_ddpg_speed[reduce-overhead-backward] 2.3392ms 2.2557ms 443.3303 Ops/s
test_sac_speed[False-None] 8.5907ms 7.4477ms 134.2694 Ops/s
test_sac_speed[False-backward] 11.1574ms 10.7280ms 93.2141 Ops/s
test_sac_speed[True-None] 2.3563ms 2.0299ms 492.6248 Ops/s
test_sac_speed[True-backward] 4.1041ms 3.9393ms 253.8519 Ops/s
test_sac_speed[reduce-overhead-None] 2.3477ms 2.0256ms 493.6704 Ops/s
test_sac_speed[reduce-overhead-backward] 4.0189ms 3.9303ms 254.4343 Ops/s
test_redq_speed[False-None] 14.4509ms 10.0389ms 99.6121 Ops/s
test_redq_speed[False-backward] 18.0911ms 17.1570ms 58.2852 Ops/s
test_redq_speed[True-None] 3.8968ms 3.5510ms 281.6125 Ops/s
test_redq_speed[True-backward] 8.8262ms 8.3292ms 120.0599 Ops/s
test_redq_speed[reduce-overhead-None] 3.9406ms 3.5157ms 284.4389 Ops/s
test_redq_speed[reduce-overhead-backward] 8.7086ms 8.3724ms 119.4403 Ops/s
test_redq_deprec_speed[False-None] 11.2866ms 10.4473ms 95.7189 Ops/s
test_redq_deprec_speed[False-backward] 15.5738ms 15.1803ms 65.8750 Ops/s
test_redq_deprec_speed[True-None] 3.5333ms 3.2230ms 310.2658 Ops/s
test_redq_deprec_speed[True-backward] 7.1380ms 6.9056ms 144.8107 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 3.5551ms 3.2084ms 311.6802 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 7.0535ms 6.8946ms 145.0417 Ops/s
test_td3_speed[False-None] 7.6046ms 7.4524ms 134.1844 Ops/s
test_td3_speed[False-backward] 11.3535ms 10.3048ms 97.0422 Ops/s
test_td3_speed[True-None] 2.0956ms 2.0624ms 484.8761 Ops/s
test_td3_speed[True-backward] 3.9781ms 3.8654ms 258.7066 Ops/s
test_td3_speed[reduce-overhead-None] 2.1030ms 2.0624ms 484.8675 Ops/s
test_td3_speed[reduce-overhead-backward] 3.9889ms 3.8675ms 258.5622 Ops/s
test_cql_speed[False-None] 26.7428ms 24.2463ms 41.2434 Ops/s
test_cql_speed[False-backward] 37.0046ms 34.0812ms 29.3417 Ops/s
test_cql_speed[True-None] 11.2252ms 10.7366ms 93.1393 Ops/s
test_cql_speed[True-backward] 17.0104ms 16.5697ms 60.3511 Ops/s
test_cql_speed[reduce-overhead-None] 11.0172ms 10.7852ms 92.7198 Ops/s
test_cql_speed[reduce-overhead-backward] 16.8395ms 16.3754ms 61.0673 Ops/s
test_a2c_speed[False-None] 5.4237ms 5.1128ms 195.5858 Ops/s
test_a2c_speed[False-backward] 11.8621ms 11.5314ms 86.7196 Ops/s
test_a2c_speed[True-None] 3.1702ms 3.0428ms 328.6485 Ops/s
test_a2c_speed[True-backward] 8.6295ms 8.4515ms 118.3225 Ops/s
test_a2c_speed[reduce-overhead-None] 3.1678ms 3.0468ms 328.2129 Ops/s
test_a2c_speed[reduce-overhead-backward] 8.7198ms 8.4499ms 118.3451 Ops/s
test_ppo_speed[False-None] 5.8324ms 5.4703ms 182.8047 Ops/s
test_ppo_speed[False-backward] 12.3390ms 11.9760ms 83.5001 Ops/s
test_ppo_speed[True-None] 3.8252ms 3.4187ms 292.5105 Ops/s
test_ppo_speed[True-backward] 8.5291ms 8.1946ms 122.0319 Ops/s
test_ppo_speed[reduce-overhead-None] 3.5545ms 3.4165ms 292.6939 Ops/s
test_ppo_speed[reduce-overhead-backward] 8.6476ms 8.2645ms 120.9994 Ops/s
test_reinforce_speed[False-None] 4.8245ms 4.2964ms 232.7545 Ops/s
test_reinforce_speed[False-backward] 7.3000ms 7.0350ms 142.1464 Ops/s
test_reinforce_speed[True-None] 2.6198ms 2.1902ms 456.5837 Ops/s
test_reinforce_speed[True-backward] 7.2949ms 7.0129ms 142.5943 Ops/s
test_reinforce_speed[reduce-overhead-None] 2.5899ms 2.2096ms 452.5632 Ops/s
test_reinforce_speed[reduce-overhead-backward] 7.2854ms 7.0199ms 142.4523 Ops/s
test_iql_speed[False-None] 20.3644ms 19.0804ms 52.4099 Ops/s
test_iql_speed[False-backward] 30.0074ms 29.2335ms 34.2073 Ops/s
test_iql_speed[True-None] 7.9935ms 7.7514ms 129.0086 Ops/s
test_iql_speed[True-backward] 16.8746ms 16.4343ms 60.8483 Ops/s
test_iql_speed[reduce-overhead-None] 8.0119ms 7.7167ms 129.5884 Ops/s
test_iql_speed[reduce-overhead-backward] 16.8050ms 16.4777ms 60.6881 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.9522ms 6.7344ms 148.4907 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0746ms 0.3402ms 2.9391 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5822ms 0.3196ms 3.1288 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.9833ms 6.5629ms 152.3706 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6359ms 0.3324ms 3.0088 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6156ms 0.3108ms 3.2176 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6791ms 1.3999ms 714.3534 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5954ms 1.3375ms 747.6586 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.0765ms 6.8367ms 146.2695 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8092ms 0.4109ms 2.4336 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6881ms 0.4602ms 2.1729 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 10.9838ms 7.0844ms 141.1553 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.4049s 0.6879ms 1.4538 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5472ms 0.3218ms 3.1076 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.9288ms 6.6759ms 149.7917 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9163ms 0.3385ms 2.9543 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5207ms 0.3172ms 3.1528 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.1266ms 6.9604ms 143.6703 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8256ms 0.4868ms 2.0540 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7643ms 0.4667ms 2.1428 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.2429ms 5.4267ms 184.2745 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.3969s 23.8860ms 41.8656 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3765ms 1.1132ms 898.3321 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.4923ms 5.4521ms 183.4155 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 20.8932ms 15.7296ms 63.5745 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.2330ms 1.2816ms 780.2717 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3442s 12.3946ms 80.6800 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 22.1311ms 16.4031ms 60.9641 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.9557ms 1.4554ms 687.0752 Ops/s

Copy link

github-actions bot commented Sep 17, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 146. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 61.3259ms 60.1450ms 16.6265 Ops/s 16.5946 Ops/s $\color{#35bf28}+0.19\%$
test_sync 48.0637ms 35.4861ms 28.1801 Ops/s 31.1550 Ops/s $\textbf{\color{#d91a1a}-9.55\%}$
test_async 53.4725ms 30.6309ms 32.6468 Ops/s 32.7580 Ops/s $\color{#d91a1a}-0.34\%$
test_simple 0.4904s 0.4205s 2.3782 Ops/s 2.4647 Ops/s $\color{#d91a1a}-3.51\%$
test_transformed 0.5638s 0.5572s 1.7946 Ops/s 1.7576 Ops/s $\color{#35bf28}+2.10\%$
test_serial 1.2793s 1.2707s 0.7870 Ops/s 0.7747 Ops/s $\color{#35bf28}+1.59\%$
test_parallel 1.1876s 1.1227s 0.8907 Ops/s 0.8839 Ops/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-True-True-True] 0.1986ms 27.8487μs 35.9083 KOps/s 36.0471 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[True-True-True-True-False] 50.9250μs 15.8929μs 62.9211 KOps/s 60.3082 KOps/s $\color{#35bf28}+4.33\%$
test_step_mdp_speed[True-True-True-False-True] 44.2230μs 15.6284μs 63.9863 KOps/s 61.8798 KOps/s $\color{#35bf28}+3.40\%$
test_step_mdp_speed[True-True-True-False-False] 47.3000μs 9.3172μs 107.3279 KOps/s 105.8861 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-True-False-True-True] 78.5670μs 29.1315μs 34.3271 KOps/s 33.2159 KOps/s $\color{#35bf28}+3.35\%$
test_step_mdp_speed[True-True-False-True-False] 41.4470μs 17.8122μs 56.1411 KOps/s 55.5787 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-True-False-False-True] 64.3200μs 17.5852μs 56.8660 KOps/s 56.9113 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[True-True-False-False-False] 56.0440μs 11.0030μs 90.8841 KOps/s 89.1493 KOps/s $\color{#35bf28}+1.95\%$
test_step_mdp_speed[True-False-True-True-True] 78.6060μs 31.0814μs 32.1736 KOps/s 31.3879 KOps/s $\color{#35bf28}+2.50\%$
test_step_mdp_speed[True-False-True-True-False] 61.2140μs 19.4153μs 51.5058 KOps/s 49.8748 KOps/s $\color{#35bf28}+3.27\%$
test_step_mdp_speed[True-False-True-False-True] 44.5720μs 17.0443μs 58.6706 KOps/s 55.9147 KOps/s $\color{#35bf28}+4.93\%$
test_step_mdp_speed[True-False-True-False-False] 50.1430μs 10.8797μs 91.9139 KOps/s 89.4298 KOps/s $\color{#35bf28}+2.78\%$
test_step_mdp_speed[True-False-False-True-True] 78.6860μs 32.2978μs 30.9619 KOps/s 30.0015 KOps/s $\color{#35bf28}+3.20\%$
test_step_mdp_speed[True-False-False-True-False] 62.0860μs 20.5069μs 48.7640 KOps/s 46.9242 KOps/s $\color{#35bf28}+3.92\%$
test_step_mdp_speed[True-False-False-False-True] 46.8470μs 18.9282μs 52.8313 KOps/s 51.2847 KOps/s $\color{#35bf28}+3.02\%$
test_step_mdp_speed[True-False-False-False-False] 49.6130μs 12.1455μs 82.3351 KOps/s 77.3877 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_step_mdp_speed[False-True-True-True-True] 74.8190μs 30.0063μs 33.3263 KOps/s 31.3685 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_step_mdp_speed[False-True-True-True-False] 71.3630μs 19.0998μs 52.3567 KOps/s 49.5894 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_step_mdp_speed[False-True-True-False-True] 68.7880μs 19.8927μs 50.2698 KOps/s 47.3585 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_step_mdp_speed[False-True-True-False-False] 37.9910μs 12.0901μs 82.7120 KOps/s 80.0938 KOps/s $\color{#35bf28}+3.27\%$
test_step_mdp_speed[False-True-False-True-True] 81.5720μs 32.2137μs 31.0426 KOps/s 29.7330 KOps/s $\color{#35bf28}+4.40\%$
test_step_mdp_speed[False-True-False-True-False] 47.0580μs 20.4119μs 48.9911 KOps/s 46.2624 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_step_mdp_speed[False-True-False-False-True] 2.8978ms 21.3847μs 46.7625 KOps/s 43.4289 KOps/s $\textbf{\color{#35bf28}+7.68\%}$
test_step_mdp_speed[False-True-False-False-False] 51.7660μs 13.2757μs 75.3257 KOps/s 70.7150 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_step_mdp_speed[False-False-True-True-True] 76.1810μs 33.5679μs 29.7904 KOps/s 28.1348 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_step_mdp_speed[False-False-True-True-False] 54.4210μs 22.5867μs 44.2739 KOps/s 43.0706 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-False-True-False-True] 69.2290μs 21.4816μs 46.5514 KOps/s 44.7838 KOps/s $\color{#35bf28}+3.95\%$
test_step_mdp_speed[False-False-True-False-False] 70.0200μs 13.4597μs 74.2957 KOps/s 70.3672 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_step_mdp_speed[False-False-False-True-True] 84.8990μs 35.2137μs 28.3981 KOps/s 27.1522 KOps/s $\color{#35bf28}+4.59\%$
test_step_mdp_speed[False-False-False-True-False] 60.8730μs 23.6716μs 42.2447 KOps/s 39.7934 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_step_mdp_speed[False-False-False-False-True] 53.7300μs 22.9129μs 43.6436 KOps/s 40.7784 KOps/s $\textbf{\color{#35bf28}+7.03\%}$
test_step_mdp_speed[False-False-False-False-False] 57.3770μs 14.6050μs 68.4699 KOps/s 63.3170 KOps/s $\textbf{\color{#35bf28}+8.14\%}$
test_values[generalized_advantage_estimate-True-True] 10.2902ms 9.3789ms 106.6225 Ops/s 102.6363 Ops/s $\color{#35bf28}+3.88\%$
test_values[vec_generalized_advantage_estimate-True-True] 36.7108ms 34.5462ms 28.9467 Ops/s 28.2807 Ops/s $\color{#35bf28}+2.35\%$
test_values[td0_return_estimate-False-False] 0.2269ms 0.1686ms 5.9328 KOps/s 6.0167 KOps/s $\color{#d91a1a}-1.39\%$
test_values[td1_return_estimate-False-False] 45.5212ms 24.1509ms 41.4063 Ops/s 39.8449 Ops/s $\color{#35bf28}+3.92\%$
test_values[vec_td1_return_estimate-False-False] 38.7194ms 35.4098ms 28.2408 Ops/s 28.1829 Ops/s $\color{#35bf28}+0.21\%$
test_values[td_lambda_return_estimate-True-False] 38.1246ms 34.2670ms 29.1826 Ops/s 28.5765 Ops/s $\color{#35bf28}+2.12\%$
test_values[vec_td_lambda_return_estimate-True-False] 42.0850ms 35.5822ms 28.1040 Ops/s 27.8498 Ops/s $\color{#35bf28}+0.91\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 18.9019ms 8.6491ms 115.6196 Ops/s 117.7432 Ops/s $\color{#d91a1a}-1.80\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2946ms 1.8968ms 527.1916 Ops/s 411.0414 Ops/s $\textbf{\color{#35bf28}+28.26\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4191ms 0.3605ms 2.7737 KOps/s 2.8091 KOps/s $\color{#d91a1a}-1.26\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.2156ms 45.2057ms 22.1211 Ops/s 21.7264 Ops/s $\color{#35bf28}+1.82\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.3325ms 3.0425ms 328.6755 Ops/s 329.0391 Ops/s $\color{#d91a1a}-0.11\%$
test_dqn_speed[False-None] 6.4252ms 1.3204ms 757.3705 Ops/s 757.3709 Ops/s $-0.00\%$
test_dqn_speed[False-backward] 1.8669ms 1.7983ms 556.0937 Ops/s 561.4932 Ops/s $\color{#d91a1a}-0.96\%$
test_dqn_speed[True-None] 0.6243ms 0.4567ms 2.1898 KOps/s 2.1587 KOps/s $\color{#35bf28}+1.44\%$
test_dqn_speed[True-backward] 0.9144ms 0.8695ms 1.1501 KOps/s 805.1145 Ops/s $\textbf{\color{#35bf28}+42.85\%}$
test_dqn_speed[reduce-overhead-None] 0.5757ms 0.4617ms 2.1661 KOps/s 2.1518 KOps/s $\color{#35bf28}+0.67\%$
test_dqn_speed[reduce-overhead-backward] 0.9801ms 0.8708ms 1.1483 KOps/s 1.1479 KOps/s $\color{#35bf28}+0.04\%$
test_ddpg_speed[False-None] 3.6321ms 2.7658ms 361.5556 Ops/s 362.4958 Ops/s $\color{#d91a1a}-0.26\%$
test_ddpg_speed[False-backward] 4.1693ms 3.8837ms 257.4843 Ops/s 259.5414 Ops/s $\color{#d91a1a}-0.79\%$
test_ddpg_speed[True-None] 1.5604ms 0.9898ms 1.0103 KOps/s 993.8140 Ops/s $\color{#35bf28}+1.66\%$
test_ddpg_speed[True-backward] 2.1398ms 1.9071ms 524.3697 Ops/s 498.8587 Ops/s $\textbf{\color{#35bf28}+5.11\%}$
test_ddpg_speed[reduce-overhead-None] 1.7024ms 1.0012ms 998.7990 Ops/s 990.5163 Ops/s $\color{#35bf28}+0.84\%$
test_ddpg_speed[reduce-overhead-backward] 1.9542ms 1.8587ms 538.0133 Ops/s 531.7579 Ops/s $\color{#35bf28}+1.18\%$
test_sac_speed[False-None] 8.8955ms 7.7928ms 128.3243 Ops/s 127.6307 Ops/s $\color{#35bf28}+0.54\%$
test_sac_speed[False-backward] 13.0135ms 10.5292ms 94.9740 Ops/s 95.4810 Ops/s $\color{#d91a1a}-0.53\%$
test_sac_speed[True-None] 2.4503ms 1.8278ms 547.0910 Ops/s 538.2342 Ops/s $\color{#35bf28}+1.65\%$
test_sac_speed[True-backward] 3.5494ms 3.4607ms 288.9567 Ops/s 281.3576 Ops/s $\color{#35bf28}+2.70\%$
test_sac_speed[reduce-overhead-None] 6.8758ms 1.8793ms 532.1007 Ops/s 538.0616 Ops/s $\color{#d91a1a}-1.11\%$
test_sac_speed[reduce-overhead-backward] 3.5619ms 3.5003ms 285.6864 Ops/s 282.5954 Ops/s $\color{#35bf28}+1.09\%$
test_redq_speed[False-None] 14.8938ms 12.7220ms 78.6038 Ops/s 77.7283 Ops/s $\color{#35bf28}+1.13\%$
test_redq_speed[False-backward] 23.1180ms 22.0126ms 45.4286 Ops/s 44.7558 Ops/s $\color{#35bf28}+1.50\%$
test_redq_speed[True-None] 5.6367ms 4.4428ms 225.0834 Ops/s 219.6111 Ops/s $\color{#35bf28}+2.49\%$
test_redq_speed[True-backward] 12.9298ms 11.7014ms 85.4596 Ops/s 82.2478 Ops/s $\color{#35bf28}+3.91\%$
test_redq_speed[reduce-overhead-None] 5.5017ms 4.4613ms 224.1517 Ops/s 219.4487 Ops/s $\color{#35bf28}+2.14\%$
test_redq_speed[reduce-overhead-backward] 12.0066ms 11.5289ms 86.7383 Ops/s 82.5383 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_redq_deprec_speed[False-None] 15.6430ms 12.4237ms 80.4914 Ops/s 77.2257 Ops/s $\color{#35bf28}+4.23\%$
test_redq_deprec_speed[False-backward] 20.7126ms 18.1480ms 55.1026 Ops/s 55.6931 Ops/s $\color{#d91a1a}-1.06\%$
test_redq_deprec_speed[True-None] 4.1835ms 3.5221ms 283.9191 Ops/s 277.4016 Ops/s $\color{#35bf28}+2.35\%$
test_redq_deprec_speed[True-backward] 7.9961ms 7.7572ms 128.9129 Ops/s 122.9637 Ops/s $\color{#35bf28}+4.84\%$
test_redq_deprec_speed[reduce-overhead-None] 4.3666ms 3.5330ms 283.0486 Ops/s 279.6198 Ops/s $\color{#35bf28}+1.23\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.9248ms 7.8922ms 126.7068 Ops/s 125.0666 Ops/s $\color{#35bf28}+1.31\%$
test_td3_speed[False-None] 9.2038ms 7.7353ms 129.2771 Ops/s 127.7201 Ops/s $\color{#35bf28}+1.22\%$
test_td3_speed[False-backward] 11.7854ms 10.1381ms 98.6380 Ops/s 98.6569 Ops/s $\color{#d91a1a}-0.02\%$
test_td3_speed[True-None] 2.0918ms 1.9121ms 522.9782 Ops/s 505.6963 Ops/s $\color{#35bf28}+3.42\%$
test_td3_speed[True-backward] 3.5915ms 3.4996ms 285.7431 Ops/s 279.6916 Ops/s $\color{#35bf28}+2.16\%$
test_td3_speed[reduce-overhead-None] 2.1959ms 1.9153ms 522.1150 Ops/s 505.7449 Ops/s $\color{#35bf28}+3.24\%$
test_td3_speed[reduce-overhead-backward] 3.5723ms 3.5053ms 285.2783 Ops/s 272.5421 Ops/s $\color{#35bf28}+4.67\%$
test_cql_speed[False-None] 37.7391ms 35.3196ms 28.3129 Ops/s 27.7299 Ops/s $\color{#35bf28}+2.10\%$
test_cql_speed[False-backward] 49.8543ms 45.9899ms 21.7439 Ops/s 21.5774 Ops/s $\color{#35bf28}+0.77\%$
test_cql_speed[True-None] 16.2894ms 15.4050ms 64.9139 Ops/s 63.2806 Ops/s $\color{#35bf28}+2.58\%$
test_cql_speed[True-backward] 24.9025ms 21.9888ms 45.4776 Ops/s 46.0712 Ops/s $\color{#d91a1a}-1.29\%$
test_cql_speed[reduce-overhead-None] 16.7943ms 15.4142ms 64.8753 Ops/s 64.5734 Ops/s $\color{#35bf28}+0.47\%$
test_cql_speed[reduce-overhead-backward] 24.1441ms 22.0726ms 45.3050 Ops/s 44.0255 Ops/s $\color{#35bf28}+2.91\%$
test_a2c_speed[False-None] 7.8878ms 7.0442ms 141.9599 Ops/s 140.0589 Ops/s $\color{#35bf28}+1.36\%$
test_a2c_speed[False-backward] 15.4671ms 13.9509ms 71.6800 Ops/s 71.3576 Ops/s $\color{#35bf28}+0.45\%$
test_a2c_speed[True-None] 4.0159ms 3.3125ms 301.8861 Ops/s 302.6650 Ops/s $\color{#d91a1a}-0.26\%$
test_a2c_speed[True-backward] 10.8392ms 9.6762ms 103.3468 Ops/s 102.9633 Ops/s $\color{#35bf28}+0.37\%$
test_a2c_speed[reduce-overhead-None] 3.9987ms 3.3054ms 302.5389 Ops/s 300.7881 Ops/s $\color{#35bf28}+0.58\%$
test_a2c_speed[reduce-overhead-backward] 9.8337ms 9.5724ms 104.4675 Ops/s 102.1884 Ops/s $\color{#35bf28}+2.23\%$
test_ppo_speed[False-None] 8.5847ms 7.3488ms 136.0762 Ops/s 134.2804 Ops/s $\color{#35bf28}+1.34\%$
test_ppo_speed[False-backward] 16.7624ms 14.3867ms 69.5089 Ops/s 69.9008 Ops/s $\color{#d91a1a}-0.56\%$
test_ppo_speed[True-None] 4.4242ms 3.6884ms 271.1189 Ops/s 270.8759 Ops/s $\color{#35bf28}+0.09\%$
test_ppo_speed[True-backward] 9.6559ms 9.4695ms 105.6017 Ops/s 106.3561 Ops/s $\color{#d91a1a}-0.71\%$
test_ppo_speed[reduce-overhead-None] 4.5017ms 3.6929ms 270.7906 Ops/s 269.2168 Ops/s $\color{#35bf28}+0.58\%$
test_ppo_speed[reduce-overhead-backward] 10.7164ms 9.5244ms 104.9940 Ops/s 105.8486 Ops/s $\color{#d91a1a}-0.81\%$
test_reinforce_speed[False-None] 7.9786ms 6.4237ms 155.6736 Ops/s 154.6276 Ops/s $\color{#35bf28}+0.68\%$
test_reinforce_speed[False-backward] 9.8953ms 9.6432ms 103.7002 Ops/s 103.2976 Ops/s $\color{#35bf28}+0.39\%$
test_reinforce_speed[True-None] 3.2108ms 2.6088ms 383.3134 Ops/s 381.0281 Ops/s $\color{#35bf28}+0.60\%$
test_reinforce_speed[True-backward] 11.2603ms 8.5752ms 116.6160 Ops/s 117.0205 Ops/s $\color{#d91a1a}-0.35\%$
test_reinforce_speed[reduce-overhead-None] 3.2107ms 2.6192ms 381.7938 Ops/s 380.6456 Ops/s $\color{#35bf28}+0.30\%$
test_reinforce_speed[reduce-overhead-backward] 8.7330ms 8.4630ms 118.1608 Ops/s 116.8817 Ops/s $\color{#35bf28}+1.09\%$
test_iql_speed[False-None] 34.3595ms 32.3748ms 30.8883 Ops/s 31.4895 Ops/s $\color{#d91a1a}-1.91\%$
test_iql_speed[False-backward] 48.5755ms 45.0770ms 22.1843 Ops/s 22.5762 Ops/s $\color{#d91a1a}-1.74\%$
test_iql_speed[True-None] 14.0051ms 13.0650ms 76.5403 Ops/s 75.2765 Ops/s $\color{#35bf28}+1.68\%$
test_iql_speed[True-backward] 24.5119ms 23.5609ms 42.4432 Ops/s 41.9250 Ops/s $\color{#35bf28}+1.24\%$
test_iql_speed[reduce-overhead-None] 14.5406ms 13.0690ms 76.5167 Ops/s 74.1231 Ops/s $\color{#35bf28}+3.23\%$
test_iql_speed[reduce-overhead-backward] 25.4558ms 23.5937ms 42.3842 Ops/s 41.3355 Ops/s $\color{#35bf28}+2.54\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.9702ms 5.0306ms 198.7824 Ops/s 194.9867 Ops/s $\color{#35bf28}+1.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.3911ms 0.4760ms 2.1007 KOps/s 2.1079 KOps/s $\color{#d91a1a}-0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.6224ms 0.4552ms 2.1970 KOps/s 2.1986 KOps/s $\color{#d91a1a}-0.08\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.6029ms 4.9081ms 203.7444 Ops/s 201.2793 Ops/s $\color{#35bf28}+1.22\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1657ms 0.4700ms 2.1277 KOps/s 2.1124 KOps/s $\color{#35bf28}+0.72\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 1.5733ms 0.4392ms 2.2768 KOps/s 2.2602 KOps/s $\color{#35bf28}+0.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.7827ms 1.6261ms 614.9522 Ops/s 615.3542 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.6868ms 1.5379ms 650.2476 Ops/s 648.4206 Ops/s $\color{#35bf28}+0.28\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.3571ms 5.1282ms 194.9997 Ops/s 191.9531 Ops/s $\color{#35bf28}+1.59\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7608ms 0.6055ms 1.6515 KOps/s 1.6303 KOps/s $\color{#35bf28}+1.30\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.6495ms 0.5846ms 1.7105 KOps/s 1.6992 KOps/s $\color{#35bf28}+0.67\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.7073ms 5.1069ms 195.8140 Ops/s 194.8918 Ops/s $\color{#35bf28}+0.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.1970ms 0.4821ms 2.0742 KOps/s 1.9629 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6624ms 0.4565ms 2.1904 KOps/s 2.1212 KOps/s $\color{#35bf28}+3.26\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 8.1688ms 4.9463ms 202.1695 Ops/s 196.5379 Ops/s $\color{#35bf28}+2.87\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1655ms 0.4722ms 2.1175 KOps/s 2.1095 KOps/s $\color{#35bf28}+0.38\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 1.6031ms 0.4476ms 2.2342 KOps/s 2.1314 KOps/s $\color{#35bf28}+4.82\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2678ms 5.1291ms 194.9676 Ops/s 189.0623 Ops/s $\color{#35bf28}+3.12\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.0319ms 0.6185ms 1.6167 KOps/s 1.6176 KOps/s $\color{#d91a1a}-0.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7949ms 0.5805ms 1.7227 KOps/s 1.6785 KOps/s $\color{#35bf28}+2.64\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3839s 11.8490ms 84.3952 Ops/s 231.5236 Ops/s $\textbf{\color{#d91a1a}-63.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.2472ms 12.8923ms 77.5659 Ops/s 74.5561 Ops/s $\color{#35bf28}+4.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.8887ms 1.4089ms 709.7960 Ops/s 761.8336 Ops/s $\textbf{\color{#d91a1a}-6.83\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 5.6106ms 4.2554ms 234.9970 Ops/s 234.5606 Ops/s $\color{#35bf28}+0.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.4231ms 12.9284ms 77.3492 Ops/s 76.0830 Ops/s $\color{#35bf28}+1.66\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.4578ms 1.3973ms 715.6470 Ops/s 696.2041 Ops/s $\color{#35bf28}+2.79\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3433s 11.1134ms 89.9815 Ops/s 232.1937 Ops/s $\textbf{\color{#d91a1a}-61.25\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.2298ms 13.0967ms 76.3549 Ops/s 74.4931 Ops/s $\color{#35bf28}+2.50\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.9091ms 1.4264ms 701.0822 Ops/s 648.6100 Ops/s $\textbf{\color{#35bf28}+8.09\%}$

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants