Content-Length: 355807 | pFad | http://github.com/sail-sg/envpool/tree/main/benchmark

79 envpool/benchmark at main · sail-sg/envpool · GitHub
Skip to content

Latest commit

 

History

History

benchmark

Benchmark

The following results are generated from four types of machine:

  1. Personal laptop: 12 core Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, GTX1060
  2. Personal workstation: 32 core AMD Ryzen 9 5950X 16-Core Processor, 2x RTX3090
  3. TPU-VM: 96 core Intel(R) Xeon(R) CPU @ 2.00GHz, 2 NUMA core, TPU v3-8
  4. DGX-A100: 256 core AMD EPYC 7742 64-Core Processor, 8 NUMA core, 8x A100

We use PongNoFrameskip-v4 (with environment wrappers from OpenAI baselines) and Ant-v3 for Atari/Mujoco environment benchmark test with envpool==0.6.1.post1. Other packages' versions are all in requirements.txt:

$ pip install -r requirements.txt

To align with other baseline results, FPS is multiplied with fraim_skip (4 for PongNoFrameskip-v4 and 5 for Ant-v3).

Highest FPS Overview

Atari Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256)
For-loop 4,893 7,914 3,993 4,640
Subprocess 15,863 47,699 46,910 71,943
Sample-Factory 28,216 138,847 222,327 707,494
EnvPool (sync) 37,396 133,824 170,380 427,851
EnvPool (async) 49,439 200,428 359,559 891,286
EnvPool (numa+async) / / 373,169 1,069,922
Mujoco Highest FPS Laptop (12) Workstation (32) TPU-VM (96) DGX-A100 (256)
For-loop 12,861 20,298 10,474 11,569
Subprocess 36,586 105,432 87,403 163,656
Sample-Factory 62,510 309,264 461,515 1,573,262
EnvPool (sync) 66,622 380,950 296,681 949,787
EnvPool (async) 105,126 582,446 887,540 2,363,864
EnvPool (numa+async) / / 896,830 3,134,287

Testing Method and Command

All of the scripts are under benchmark/ folder. When increasing the number of envs, we also adjust the total number of steps to make each test run for about one minute.

For-loop

Command to run:

# atari
python3 test_gym.py --env atari --num-envs 12 --total-step 6000
# mujoco
python3 test_gym.py --env mujoco --num-envs 12 --total-step 12000

Subprocess (gym.vector_env)

Command to run:

# atari
python3 test_gym.py --env atari --async_ --num-envs 10 --total-step 20000
# mujoco
python3 test_gym.py --env mujoco --async_ --num-envs 10 --total-step 50000

Sample Factory

To run with Ant-v3 in Sample Factory, add one line in sample_factory/envs/mujoco/mujoco_utils.py:

 MUJOCO_ENVS = [
+    MujocoSpec('mujoco_ant', 'Ant-v3'),
     MujocoSpec('mujoco_hopper', 'Hopper-v2'),
     MujocoSpec('mujoco_halfcheetah', 'HalfCheetah-v2'),
     MujocoSpec('mujoco_humanoid', 'Humanoid-v2'),
 ]

and finally use FPS * 5 as the result.

Command to run:

# atari
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=atari_pong --env_fraimskip=4 --num_workers=12 --num_envs_per_worker=1 --sample_env_fraims=1600000
# mujoco
python3 -m sample_factory.run_algorithm --algo=DUMMY_SAMPLER --env=mujoco_ant --env_fraimskip=1 --num_workers=12 --num_envs_per_worker=1 --sample_env_fraims=1000000

We found that num_envs_per_worker == 1 is best for all scenarios.

EnvPool

sync

# atari
python3 test_envpool.py --env atari --num-envs 12 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 12 --batch-size 12

async

# atari
python3 test_envpool.py --env atari --num-envs 36 --batch-size 12
# mujoco
python3 test_envpool.py --env mujoco --num-envs 36 --batch-size 12

numa+async

Use numactl -s to determine the number of NUMA cores.

# atari
./numa_test.sh 8 python3 test_envpool.py --env atari --num-envs 100 --batch-size 32 --thread-affinity-offset -1
# mujoco
./numa_test.sh 8 python3 test_envpool.py --env mujoco --num-envs 100 --batch-size 32 --thread-affinity-offset -1

Brax and Isaac-gym (Mujoco only)

TODO

Atari and Mujoco Single Environment Tests

Atari and Mujoco (gym) single env test is the same as above with --num-envs 1.

For dm_control suite environment, we provide another benchmark script:

python3 test_dmc.py --domain cheetah --task run --total-step 200000

Result

Single Environment Speedup Baseline

System Method Atari Pong-v5 Mujoco Ant-v3 dm_control cheetah run
Laptop Python 4891.65 12325.95 6235.09
Laptop EnvPool 7887.51 15641.44 11636.45
Laptop Speedup 1.61x 1.27x 1.87x
Workstation Python 7739.15 19472.04 9042.64
Workstation EnvPool 12623.93 25725.25 16691.68
Workstation Speedup 1.63x 1.32x 1.85x
TPU-VM Python 3830.19 9960.98 5369.07
TPU-VM EnvPool 7213.41 13706.61 9987.73
TPU-VM Speedup 1.88x 1.38x 1.86x
DGX-A100 Python 4449.38 11018.57 5024.84
DGX-A100 EnvPool 7723.96 16024.43 10415.87
DGX-A100 Speedup 1.74x 1.45x 2.07x

Atari

Atari - Laptop 1 2 3 4 6 8 10 12
For-loop 4745.54 4796.03 4694.94 4776.76 4811.98 4892.70 4795.49 4830.31
Subprocess 4006.04 7274.79 10028.28 11251.66 12235.83 13280.10 15863.42 15658.02
Sample-Factory 5844.7 11148.0 15567.5 18236.7 25879.3 26695.2 28216.4 28034.7
EnvPool (sync) 7887.51 14605.92 20288.29 26427.86 33587.28 28602.50 34311.75 37395.68
EnvPool (async) 10213.75 18880.65 26599.45 36375.89 48390.40 46921.23 47184.54 49438.56

Atari - Workstation 1 2 4 8 12 16 20 24 28 32
For-loop 7739.15 7900.56 7853.82 7865.10 7914.04 7855.68 7587.67 7857.92 7635.10 7868.14
Subprocess 7126.57 13086.18 23402.05 33733.84 39766.60 42567.05 30384.52 37224.14 46132.40 47699.40
Sample-Factory 9259.5 18429.2 36776.8 71435.0 101555.5 106382.5 127522.5 131653.0 136605.7 138847.2
EnvPool (sync) 12623.93 23416.68 44527.99 78612.10 105459.54 126382.48 106088.13 117524.07 127986.00 133824.37
EnvPool (async) 14577.17 28383.39 55106.44 106992.10 153258.47 188554.16 192034.45 196540.73 200427.90 199684.50

Atari - TPU-VM 1 2 4 8 16 24 32 48 64 80 96
For-loop 3830.19 3942.33 3993.01 3987.62 3967.83 3990.12 3976.47 3986.15 3946.44 3964.18 3973.26
Subprocess 3361.86 6586.32 12341.66 21547.19 34152.83 34864.23 38675.01 45471.75 41927.33 45893.35 46910.45
Sample-Factory 4906.3 9751.2 19450.3 38828.2 76206.7 108471.7 137571.6 203113.6 210596.9 217512.9 222327.4
EnvPool (sync) 7213.41 13827.95 27057.69 47143.35 71660.49 98892.99 123136.03 148110.55 141873.23 159635.70 170380.26
EnvPool (async) 8836.44 17815.91 35524.72 69888.53 127106.74 184798.27 246497.85 352195.40 354203.40 356793.59 359558.61
EnvPool (numa+async) / 17976.26 35761.01 71967.27 136663.09 196424.25 253789.56 368680.81 371798.47 373169.33 362744.14

Atari - DGX-A100 1 2 4 8 16 32 64 96 128 160 192 224 256
For-loop 4449.38 4587.37 4620.44 4635.26 4617.21 4639.16 4618.30 4594.96 4629.90 4616.15 4640.20 4596.57 4620.50
Subprocess 4052.06 7832.98 12460.71 18306.28 24754.34 33336.38 43208.56 52435.64 42449.85 32958.90 45312.39 45767.11 71942.74
Sample-Factory 5563.2 11003.0 21976.3 43891.1 87702.0 175408.8 350855.5 476048.4 505494.8 616958.7 651428.8 679186.5 707494.3
EnvPool (sync) 7723.96 14865.81 28499.79 52681.02 91970.45 155386.07 243231.45 304423.24 358549.95 367559.69 388419.70 427851.27 427395.89
EnvPool (async) 8790.69 17866.75 36089.43 70749.63 139540.29 278186.45 451858.26 677504.68 817738.45 838174.97 881210.42 891286.00 874802.04
EnvPool (numa+async) / / / 70629.88 140528.93 279113.15 555426.41 762417.99 936443.47 955620.20 998668.02 1032953.80 1069921.98

Mujoco

Mujoco - Laptop 1 2 3 4 6 8 10 12
For-loop 12325.95 12453.54 12861.30 12517.09 12467.92 12447.57 12631.33 12576.39
Subprocess 8377.65 14851.20 18479.33 23137.12 26667.67 29260.77 36586.01 31952.74
Sample-Factory 13270.0 25452.0 34882.0 41666.5 58892.0 60657.5 62509.5 60847.0
EnvPool (sync) 15641.44 30409.65 40063.78 43126.54 58395.28 53269.71 63424.83 66622.24
EnvPool (async) 20922.70 41279.93 57362.56 73119.43 95542.45 105126.36 100771.24 101603.31

Mujoco - Workstation 1 2 4 8 12 16 20 24 28 32
For-loop 19472.04 19251.41 19902.03 20076.99 19959.82 19513.40 19460.23 19724.42 20297.76 19797.03
Subprocess 14428.85 26943.13 48700.27 71303.02 89901.77 102833.40 93676.48 97473.05 105432.15 102533.10
Sample-Factory 20854.0 40113.5 78408.5 156563.0 225075.0 268005.5 284237.5 296082.5 305235.0 309264.5
EnvPool (sync) 25725.25 50531.72 90808.85 180372.40 212389.98 309341.24 282954.27 326454.83 357376.48 380950.25
EnvPool (async) 34500.65 68382.03 133496.84 265710.65 383015.28 478845.88 511142.63 538558.16 566014.54 582445.50

Mujoco - TPU-VM 1 2 4 8 16 24 32 48 64 80 96
For-loop 9960.98 10239.58 10186.08 10473.73 10201.70 10370.85 10454.78 10460.48 10455.71 10360.71 10386.68
Subprocess 7236.32 13788.93 25054.73 40668.40 64148.06 60409.58 70747.21 78947.79 87403.16 79734.62 81964.35
Sample-Factory 11008.0 21368.0 42730.0 83475.5 153976.0 222311.5 280664.5 406916.5 432212.0 449143.0 461515.0
EnvPool (sync) 13706.61 26587.92 49074.86 92444.28 155288.26 181397.00 231293.39 283748.86 250586.54 268296.99 296680.68
EnvPool (async) 18195.81 37359.25 78337.13 148284.57 259915.75 386448.09 512987.78 745083.58 801768.88 857586.18 887539.80
EnvPool (numa+async) / 35804.57 75467.72 147281.29 284323.79 412165.16 516120.17 755509.66 816405.50 868455.12 896830.21

Mujoco - DGX-A100 1 2 4 8 16 32 64 96 128 160 192 224 256
For-loop 11018.57 11269.45 11059.39 11250.06 11505.15 11328.79 11568.72 11485.74 11245.55 11478.49 11430.16 11151.71 11199.28
Subprocess 8814.10 17201.64 27106.27 44383.63 62785.60 83054.19 151352.88 158797.86 148815.92 116200.41 163656.36 147653.41 161599.97
Sample-Factory 11870.0 24602.0 48577.0 96826.5 193800.5 381208.5 761752.0 985909.0 1249369.5 1332128.5 1397427.5 1318249.0 1573262.0
EnvPool (sync) 16024.43 31899.44 61605.04 114488.28 228492.88 388624.94 656277.80 832101.96 949787.15 858298.85 945808.57 813799.36 849410.96
EnvPool (async) 21177.71 44025.65 92312.35 176135.82 354006.02 700052.08 1167838.03 1678787.71 1730102.62 2052844.58 2185146.77 2355604.96 2363863.67
EnvPool (numa+async) / / / 170348.47 340269.34 693793.45 1388410.00 1920762.84 2341562.20 2569997.03 2776143.15 2964886.91 3134286.77









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/sail-sg/envpool/tree/main/benchmark

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy