ai-benchmark

Here are 4 public repositories matching this topic...

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

windows ai computer ai-research ai-agent agentic ai-benchmark desktop-agent computer-use

An agent benchmark with tasks in a simulated software company.

agent benchmark ai ai-research llm ai-benchmark

GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities

GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities

Add a description, image, and links to the ai-benchmark topic page so that developers can more easily learn about it.

To associate your repository with the ai-benchmark topic, visit your repo's landing page and select "manage topics."