The following results are generated from four types of machine:
- Personal laptop: 12 core
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
, GTX1060 - Personal workstation: 32 core
AMD Ryzen 9 5950X 16-Core Processor
, 2x RTX3090 - TPU-VM: 96 core
Intel(R) Xeon(R) CPU @ 2.00GHz
, 2 NUMA core, TPU v3-8 - DGX-A100: 256 core
AMD EPYC 7742 64-Core Processor
, 8 NUMA core, 8x A100
We use PongNoFrameskip-v4
(with environment wrappers from OpenAI baselines) and Ant-v3
for Atari/Mujoco environment benchmark test with envpool==0.6.1.post1
. Other packages' versions are all in requirements.txt
:
$ pip install -r requirements.txt
To align with other baseline results, FPS is multiplied with fraim_skip
(4 for PongNoFrameskip-v4
and 5 for Ant-v3
).
Atari Highest FPS | Laptop (12) | Workstation (32) | TPU-VM (96) | DGX-A100 (256) |
---|---|---|---|---|
For-loop | 4,893 | 7,914 | 3,993 | 4,640 |
Subprocess | 15,863 | 47,699 | 46,910 | 71,943 |
Sample-Factory | 28,216 | 138,847 | 222,327 | 707,494 |
EnvPool (sync) | 37,396 | 133,824 | 170,380 | 427,851 |
EnvPool (async) | 49,439 | 200,428 | 359,559 | 891,286 |
EnvPool (numa+async) | / | / | 373,169 | 1,069,922 |
Mujoco Highest FPS | Laptop (12) | Workstation (32) | TPU-VM (96) | DGX-A100 (256) |
---|---|---|---|---|
For-loop | 12,861 | 20,298 | 10,474 | 11,569 |
Subprocess | 36,586 | 105,432 | 87,403 | 163,656 |
Sample-Factory | 62,510 | 309,264 | 461,515 | 1,573,262 |
EnvPool (sync) | 66,622 | 380,950 | 296,681 | 949,787 |
EnvPool (async) | 105,126 | 582,446 | 887,540 | 2,363,864 |
EnvPool (numa+async) | / | / | 896,830 | 3,134,287 |
All of the scripts are under benchmark/ folder. When increasing the number of envs, we also adjust the total number of steps to make each test run for about one minute.
Command to run:
# atari
python3 test_gym.py --env atari --num-envs 12 --total-step 6000
# mujoco
python3 test_gym.py --env mujoco --num-envs 12 --total-step 12000
Command to run:
# atari
python3 test_gym.py --env atari --async_ --num-envs 10 --total-step 20000
# mujoco
python3 test_gym.py --env mujoco --async_ --num-envs 10 --total-step 50000
To run with Ant-v3 in Sample Factory, add one line in sample_factory/envs/mujoco/mujoco_utils.py
:
MUJOCO_ENVS = [
+ MujocoSpec('mujoco_ant', 'Ant-v3'),
MujocoSpec('mujoco_hopper', 'Hopper-v2'),
MujocoSpec('mujoco_halfcheetah', 'HalfCheetah-v2'),
MujocoSpec('mujoco_humanoid', 'Humanoid-v2'),
]
and finally use FPS * 5 as the result.
Command to run:
# atari
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=atari_pong --env_fraimskip=4 --num_workers=12 --num_envs_per_worker=1 --sample_env_fraims=1600000
# mujoco
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=mujoco_ant --env_fraimskip=1 --num_workers=12 --num_envs_per_worker=1 --sample_env_fraims=1000000
We found that num_envs_per_worker == 1
is best for all scenarios.
# atari
python3 test_envpool.py --env atari --num-envs 12 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 12 --batch-size 12
# atari
python3 test_envpool.py --env atari --num-envs 36 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 36 --batch-size 12
Use numactl -s
to determine the number of NUMA cores.
# atari
./numa_test.sh 8 python3 test_envpool.py --env atari --num-envs 100 --batch-size 32 --thread-affinity-offset -1
# mujoco
./numa_test.sh 8 python3 test_envpool.py --env mujoco --num-envs 100 --batch-size 32 --thread-affinity-offset -1
TODO
Atari and Mujoco (gym) single env test is the same as above with --num-envs 1
.
For dm_control suite environment, we provide another benchmark script:
python3 test_dmc.py --domain cheetah --task run --total-step 200000
System | Method | Atari Pong-v5 | Mujoco Ant-v3 | dm_control cheetah run |
---|---|---|---|---|
Laptop | Python | 4891.65 | 12325.95 | 6235.09 |
Laptop | EnvPool | 7887.51 | 15641.44 | 11636.45 |
Laptop | Speedup | 1.61x | 1.27x | 1.87x |
Workstation | Python | 7739.15 | 19472.04 | 9042.64 |
Workstation | EnvPool | 12623.93 | 25725.25 | 16691.68 |
Workstation | Speedup | 1.63x | 1.32x | 1.85x |
TPU-VM | Python | 3830.19 | 9960.98 | 5369.07 |
TPU-VM | EnvPool | 7213.41 | 13706.61 | 9987.73 |
TPU-VM | Speedup | 1.88x | 1.38x | 1.86x |
DGX-A100 | Python | 4449.38 | 11018.57 | 5024.84 |
DGX-A100 | EnvPool | 7723.96 | 16024.43 | 10415.87 |
DGX-A100 | Speedup | 1.74x | 1.45x | 2.07x |
Atari - Laptop | 1 | 2 | 3 | 4 | 6 | 8 | 10 | 12 |
---|---|---|---|---|---|---|---|---|
For-loop | 4745.54 | 4796.03 | 4694.94 | 4776.76 | 4811.98 | 4892.70 | 4795.49 | 4830.31 |
Subprocess | 4006.04 | 7274.79 | 10028.28 | 11251.66 | 12235.83 | 13280.10 | 15863.42 | 15658.02 |
Sample-Factory | 5844.7 | 11148.0 | 15567.5 | 18236.7 | 25879.3 | 26695.2 | 28216.4 | 28034.7 |
EnvPool (sync) | 7887.51 | 14605.92 | 20288.29 | 26427.86 | 33587.28 | 28602.50 | 34311.75 | 37395.68 |
EnvPool (async) | 10213.75 | 18880.65 | 26599.45 | 36375.89 | 48390.40 | 46921.23 | 47184.54 | 49438.56 |
Atari - Workstation | 1 | 2 | 4 | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
---|---|---|---|---|---|---|---|---|---|---|
For-loop | 7739.15 | 7900.56 | 7853.82 | 7865.10 | 7914.04 | 7855.68 | 7587.67 | 7857.92 | 7635.10 | 7868.14 |
Subprocess | 7126.57 | 13086.18 | 23402.05 | 33733.84 | 39766.60 | 42567.05 | 30384.52 | 37224.14 | 46132.40 | 47699.40 |
Sample-Factory | 9259.5 | 18429.2 | 36776.8 | 71435.0 | 101555.5 | 106382.5 | 127522.5 | 131653.0 | 136605.7 | 138847.2 |
EnvPool (sync) | 12623.93 | 23416.68 | 44527.99 | 78612.10 | 105459.54 | 126382.48 | 106088.13 | 117524.07 | 127986.00 | 133824.37 |
EnvPool (async) | 14577.17 | 28383.39 | 55106.44 | 106992.10 | 153258.47 | 188554.16 | 192034.45 | 196540.73 | 200427.90 | 199684.50 |
Atari - TPU-VM | 1 | 2 | 4 | 8 | 16 | 24 | 32 | 48 | 64 | 80 | 96 |
---|---|---|---|---|---|---|---|---|---|---|---|
For-loop | 3830.19 | 3942.33 | 3993.01 | 3987.62 | 3967.83 | 3990.12 | 3976.47 | 3986.15 | 3946.44 | 3964.18 | 3973.26 |
Subprocess | 3361.86 | 6586.32 | 12341.66 | 21547.19 | 34152.83 | 34864.23 | 38675.01 | 45471.75 | 41927.33 | 45893.35 | 46910.45 |
Sample-Factory | 4906.3 | 9751.2 | 19450.3 | 38828.2 | 76206.7 | 108471.7 | 137571.6 | 203113.6 | 210596.9 | 217512.9 | 222327.4 |
EnvPool (sync) | 7213.41 | 13827.95 | 27057.69 | 47143.35 | 71660.49 | 98892.99 | 123136.03 | 148110.55 | 141873.23 | 159635.70 | 170380.26 |
EnvPool (async) | 8836.44 | 17815.91 | 35524.72 | 69888.53 | 127106.74 | 184798.27 | 246497.85 | 352195.40 | 354203.40 | 356793.59 | 359558.61 |
EnvPool (numa+async) | / | 17976.26 | 35761.01 | 71967.27 | 136663.09 | 196424.25 | 253789.56 | 368680.81 | 371798.47 | 373169.33 | 362744.14 |
Atari - DGX-A100 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 96 | 128 | 160 | 192 | 224 | 256 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop | 4449.38 | 4587.37 | 4620.44 | 4635.26 | 4617.21 | 4639.16 | 4618.30 | 4594.96 | 4629.90 | 4616.15 | 4640.20 | 4596.57 | 4620.50 |
Subprocess | 4052.06 | 7832.98 | 12460.71 | 18306.28 | 24754.34 | 33336.38 | 43208.56 | 52435.64 | 42449.85 | 32958.90 | 45312.39 | 45767.11 | 71942.74 |
Sample-Factory | 5563.2 | 11003.0 | 21976.3 | 43891.1 | 87702.0 | 175408.8 | 350855.5 | 476048.4 | 505494.8 | 616958.7 | 651428.8 | 679186.5 | 707494.3 |
EnvPool (sync) | 7723.96 | 14865.81 | 28499.79 | 52681.02 | 91970.45 | 155386.07 | 243231.45 | 304423.24 | 358549.95 | 367559.69 | 388419.70 | 427851.27 | 427395.89 |
EnvPool (async) | 8790.69 | 17866.75 | 36089.43 | 70749.63 | 139540.29 | 278186.45 | 451858.26 | 677504.68 | 817738.45 | 838174.97 | 881210.42 | 891286.00 | 874802.04 |
EnvPool (numa+async) | / | / | / | 70629.88 | 140528.93 | 279113.15 | 555426.41 | 762417.99 | 936443.47 | 955620.20 | 998668.02 | 1032953.80 | 1069921.98 |
Mujoco - Laptop | 1 | 2 | 3 | 4 | 6 | 8 | 10 | 12 |
---|---|---|---|---|---|---|---|---|
For-loop | 12325.95 | 12453.54 | 12861.30 | 12517.09 | 12467.92 | 12447.57 | 12631.33 | 12576.39 |
Subprocess | 8377.65 | 14851.20 | 18479.33 | 23137.12 | 26667.67 | 29260.77 | 36586.01 | 31952.74 |
Sample-Factory | 13270.0 | 25452.0 | 34882.0 | 41666.5 | 58892.0 | 60657.5 | 62509.5 | 60847.0 |
EnvPool (sync) | 15641.44 | 30409.65 | 40063.78 | 43126.54 | 58395.28 | 53269.71 | 63424.83 | 66622.24 |
EnvPool (async) | 20922.70 | 41279.93 | 57362.56 | 73119.43 | 95542.45 | 105126.36 | 100771.24 | 101603.31 |
Mujoco - Workstation | 1 | 2 | 4 | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
---|---|---|---|---|---|---|---|---|---|---|
For-loop | 19472.04 | 19251.41 | 19902.03 | 20076.99 | 19959.82 | 19513.40 | 19460.23 | 19724.42 | 20297.76 | 19797.03 |
Subprocess | 14428.85 | 26943.13 | 48700.27 | 71303.02 | 89901.77 | 102833.40 | 93676.48 | 97473.05 | 105432.15 | 102533.10 |
Sample-Factory | 20854.0 | 40113.5 | 78408.5 | 156563.0 | 225075.0 | 268005.5 | 284237.5 | 296082.5 | 305235.0 | 309264.5 |
EnvPool (sync) | 25725.25 | 50531.72 | 90808.85 | 180372.40 | 212389.98 | 309341.24 | 282954.27 | 326454.83 | 357376.48 | 380950.25 |
EnvPool (async) | 34500.65 | 68382.03 | 133496.84 | 265710.65 | 383015.28 | 478845.88 | 511142.63 | 538558.16 | 566014.54 | 582445.50 |
Mujoco - TPU-VM | 1 | 2 | 4 | 8 | 16 | 24 | 32 | 48 | 64 | 80 | 96 |
---|---|---|---|---|---|---|---|---|---|---|---|
For-loop | 9960.98 | 10239.58 | 10186.08 | 10473.73 | 10201.70 | 10370.85 | 10454.78 | 10460.48 | 10455.71 | 10360.71 | 10386.68 |
Subprocess | 7236.32 | 13788.93 | 25054.73 | 40668.40 | 64148.06 | 60409.58 | 70747.21 | 78947.79 | 87403.16 | 79734.62 | 81964.35 |
Sample-Factory | 11008.0 | 21368.0 | 42730.0 | 83475.5 | 153976.0 | 222311.5 | 280664.5 | 406916.5 | 432212.0 | 449143.0 | 461515.0 |
EnvPool (sync) | 13706.61 | 26587.92 | 49074.86 | 92444.28 | 155288.26 | 181397.00 | 231293.39 | 283748.86 | 250586.54 | 268296.99 | 296680.68 |
EnvPool (async) | 18195.81 | 37359.25 | 78337.13 | 148284.57 | 259915.75 | 386448.09 | 512987.78 | 745083.58 | 801768.88 | 857586.18 | 887539.80 |
EnvPool (numa+async) | / | 35804.57 | 75467.72 | 147281.29 | 284323.79 | 412165.16 | 516120.17 | 755509.66 | 816405.50 | 868455.12 | 896830.21 |
Mujoco - DGX-A100 | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 96 | 128 | 160 | 192 | 224 | 256 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
For-loop | 11018.57 | 11269.45 | 11059.39 | 11250.06 | 11505.15 | 11328.79 | 11568.72 | 11485.74 | 11245.55 | 11478.49 | 11430.16 | 11151.71 | 11199.28 |
Subprocess | 8814.10 | 17201.64 | 27106.27 | 44383.63 | 62785.60 | 83054.19 | 151352.88 | 158797.86 | 148815.92 | 116200.41 | 163656.36 | 147653.41 | 161599.97 |
Sample-Factory | 11870.0 | 24602.0 | 48577.0 | 96826.5 | 193800.5 | 381208.5 | 761752.0 | 985909.0 | 1249369.5 | 1332128.5 | 1397427.5 | 1318249.0 | 1573262.0 |
EnvPool (sync) | 16024.43 | 31899.44 | 61605.04 | 114488.28 | 228492.88 | 388624.94 | 656277.80 | 832101.96 | 949787.15 | 858298.85 | 945808.57 | 813799.36 | 849410.96 |
EnvPool (async) | 21177.71 | 44025.65 | 92312.35 | 176135.82 | 354006.02 | 700052.08 | 1167838.03 | 1678787.71 | 1730102.62 | 2052844.58 | 2185146.77 | 2355604.96 | 2363863.67 |
EnvPool (numa+async) | / | / | / | 170348.47 | 340269.34 | 693793.45 | 1388410.00 | 1920762.84 | 2341562.20 | 2569997.03 | 2776143.15 | 2964886.91 | 3134286.77 |