-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Hello,
I downloaded the file 'Gemini-Pro-1.5 (May)/Scenario.codegeneration_10_0.2_eval_all.json' from the associated repository in HuggingFace.
When I calculate the metrics based on the results already computed in the file I obtain:
Overall pass@1 average: 0.382 over 400 problems
Easy : 0.787 over 142 problems
Medium : 0.221 over 168 problems
Hard : 0.046 over 90 problems
However, if I calculate the results myself with the custom evaluator, I obtain:
Overall pass@1 average: 0.406 over 400 problems
Easy : 0.827 over 142 problems
Medium : 0.242 over 168 problems
Hard : 0.046 over 90 problems
And according to the paper, the results should look like:
Overall pass@1 average: 0.33 over 400 problems
Easy : 0.76 over 142 problems
Medium : 0.194 over 168 problems
Hard : 0.035 over 90 problems
Would it be posible to get some clarification about this?
Furthermore, maybe providing some file and the results expected to double-check that the evaluators are correct.
Thank you for your time and effort.