Tags: ShishirPatil/gorilla
Tags
[BFCL Chore] Ensure Correct Input Format for Eval Checker (#860) In some cases, the model handler’s decode_ast method returns successfully but produces output in an unexpected format, causing issues in downstream evaluations that do not perform argument format validation. This problem is especially common when the model does not output any function calls, resulting in a human-readable string instead of the expected structure. This PR refines the `is_function_calling_format_output` function to enforce that outputs must be a list of dictionaries in the following format before calling the checker function: ``` [ {func1: {param1: val1, param2: val2, ...}}, {func2: {param1: val1, param2: val2, ...}}, ... ] ``` Note: This PR will not affect the leaderboard score.
[BFCL] Relocate Formatting Instructions and Function Documentation to… … System Prompt (#593) Previously, formatting instructions and function documentation were included in the user prompt when interacting with models in prompting mode. However, these details are better suited for the system prompt, where they can more effectively guide the model's behaviour. This PR updates the model prompting process by moving the formatting instructions and function documentation to the system prompt, ensuring they are appropriately positioned for optimal model performance. This **will affect** the leaderboard score. ---- Also in this PR: 1. Update the model handlers to record the processed prompt/message and tools (if FC mode) in the result file when inference. This helps to identify if there are any issues with the pre-processing phase. 2. Fix 6 dataset issues: `irrelevance_49, live_irrelevance_157-18-1, live_simple_79-40-0, live_parallel_0-0-0, live_parallel_4-1-0, live_parallel_5-2-0`
[BFCL Chore] Fix Functionary Medium 3.1 model name & add readme paral… …lel inference (#577) Changes: - Fix Functionary Medium 3.1 model version name in `eval_runner_helper.py` - add readme parallel inference --------- Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
Fix breaking changes due to updated Anthropic SDK (#452) Anthropic just moved their tool use from beta to main so we have to change the import `from anthropic.types.beta.tools import ToolUseBlock` to `from anthropic.types import ToolUseBlock`. You cannot run the eval without this change as things break. Also, my IDE automatically sorted the imported packages and removed some extra spaces -- this explains all the other changes.