-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
Documentation page: https://docs.judgmentlabs.ai/documentation/evaluation/unit-testing
Context:
The unit testing documentation walks through the basics of writing structured tests against evaluation datasets. However, it does not cover how to handle scorer failures such as API timeouts, unexpected inputs, or internal errors. In real-world pipelines, scorers may partially fail or return invalid results—especially when integrating with external APIs or custom logic.
Suggestion:
Add a section titled “Handling Scorer Failures in Unit Tests” with examples that show how to:
- Catch exceptions raised by scorers (e.g., TimeoutError, KeyError)
- Handle examples with missing or malformed fields (e.g., no tool_calls)
- Fallback to a default ScoreResult and log an error instead of halting the evaluation
- Use pytest.raises to explicitly test for expected failures
- Demonstrate cleanup behavior for scorers with internal state
This would help users write robust and fault-tolerant evaluation pipelines, rather than tests that silently break or skip evaluation under real-world conditions.
Metadata
Metadata
Assignees
Labels
No labels