Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
-
Updated
Nov 20, 2024 - Python
Content-Length: 218213 | pFad | http://github.com/topics/ai-benchmark
6FWindows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
An agent benchmark with tasks in a simulated software company.
GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities
GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities
Add a description, image, and links to the ai-benchmark topic page so that developers can more easily learn about it.
To associate your repository with the ai-benchmark topic, visit your repo's landing page and select "manage topics."
Fetched URL: http://github.com/topics/ai-benchmark
Alternative Proxies: