Discrepancies in the scores of Gemini 1.5 Pro (May)

Hello,
I downloaded the file 'Gemini-Pro-1.5 (May)/Scenario.codegeneration_10_0.2_eval_all.json' from the [associated repository in HuggingFace.](https://huggingface.co/datasets/livecodebench/submissions/blob/main/Gemini-Pro-1.5%20(May)/Scenario.codegeneration_10_0.2_eval_all.json)
When I calculate the metrics based on the results already computed in the file I obtain:

Overall pass@1 average: 0.382 over 400 problems
Easy : 0.787 over 142 problems
Medium : 0.221 over 168 problems
Hard : 0.046 over 90 problems

However, if I calculate the results myself with the custom evaluator, I obtain:

Overall pass@1 average: 0.406 over 400 problems
Easy : 0.827 over 142 problems
Medium : 0.242 over 168 problems
Hard : 0.046 over 90 problems

And according to the paper, the results should look like:

Overall pass@1 average: 0.33 over 400 problems
Easy : 0.76 over 142 problems
Medium : 0.194 over 168 problems
Hard : 0.035 over 90 problems

Would it be posible to get some clarification about this? 
Furthermore, maybe providing some file and the results expected to double-check that the evaluators are correct.

Thank you for your time and effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancies in the scores of Gemini 1.5 Pro (May) #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Discrepancies in the scores of Gemini 1.5 Pro (May) #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!