[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

HuanzhiMao · 2024-07-06T23:09:55Z

There are inconsistencies between the test_category argument that's used by eval_checker/eval_runner.py and openfunctions_evaluation.py.

This PR partially addresses #501 and #502.

CharlieJCJ

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

HuanzhiMao · 2024-07-08T05:51:24Z

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

Good point. Will address this in a different PR.

ShishirPatil · 2024-07-11T06:32:40Z

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

HuanzhiMao · 2024-07-11T06:35:12Z

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

devanshamin · 2024-07-11T12:56:11Z

I have solved this issue here.

ShishirPatil · 2024-07-17T06:26:39Z

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

I don't think this solves it right? We should present the flags when the user cares about the command, not super down. Folks read and execute code from READMEs linearly, and making them jump around isn't a good idea imo.

As mentioned in #506, this PR make the warning messages more informative for user to know action items when aggregating leaderboard results. --------- Co-authored-by: CharlieJCJ <charliechengjieji@berkeley.edu>

…ns_evaluation.py (ShishirPatil#506) There are inconsistencies between the `test_category` argument that's used by `eval_checker/eval_runner.py` and `openfunctions_evaluation.py`. This PR partially addresses ShishirPatil#501 and ShishirPatil#502. --------- Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>

…#517) As mentioned in ShishirPatil#506, this PR make the warning messages more informative for user to know action items when aggregating leaderboard results. --------- Co-authored-by: CharlieJCJ <charliechengjieji@berkeley.edu>

HuanzhiMao added 2 commits July 6, 2024 16:05

add more test category option in eval_runner

b985d30

Refactor load_file function to accept test_categories as an argument

d98d214

HuanzhiMao marked this pull request as ready for review July 7, 2024 00:07

rename ARG_PARSE_MAPPING to TEST_COLLECTION_MAPPING

d2be294

HuanzhiMao changed the title ~~[BFCL] Standarize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py~~ [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py Jul 7, 2024

HuanzhiMao added 6 commits July 7, 2024 15:00

Merge branch 'main' into test-category

0a36c48

update test category on readme

5f45728

refactor README on available test category section

43fae52

increase diversity in model for the example usage section

e07d835

add link to model-available section

4bc8500

use single source of truth for TEST_COLLECTION_MAPPING

0f0895f

CharlieJCJ approved these changes Jul 8, 2024

View reviewed changes

Merge branch 'main' into test-category

dd6dc71

HuanzhiMao mentioned this pull request Jul 9, 2024

[BFCL] Improve Warning Message when Aggregating Results #517

Merged

Merge branch 'main' into test-category

e33a708

HuanzhiMao added 2 commits July 16, 2024 23:07

Merge remote-tracking branch 'upstream/main' into test-category

4bb755f

Merge remote-tracking branch 'upstream/main' into test-category

dfec57b

HuanzhiMao added 4 commits July 18, 2024 15:05

Merge branch 'main' into test-category

4d3a26b

update README

535b4c8

add description about default options

33b3ac4

fix typo

cefe292

HuanzhiMao requested a review from ShishirPatil July 18, 2024 22:52

Update README.md

7eb8e8e

ShishirPatil approved these changes Jul 19, 2024

View reviewed changes

ShishirPatil merged commit a9dd435 into ShishirPatil:main Jul 19, 2024

HuanzhiMao mentioned this pull request Jul 19, 2024

Clarify Documentation About Running The Benchmark #502

Closed

HuanzhiMao deleted the test-category branch July 21, 2024 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

HuanzhiMao commented Jul 6, 2024

CharlieJCJ left a comment •

edited

Loading

HuanzhiMao commented Jul 8, 2024

ShishirPatil commented Jul 11, 2024

HuanzhiMao commented Jul 11, 2024

devanshamin commented Jul 11, 2024

ShishirPatil commented Jul 17, 2024

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

Conversation

HuanzhiMao commented Jul 6, 2024

CharlieJCJ left a comment • edited Loading

Choose a reason for hiding this comment

HuanzhiMao commented Jul 8, 2024

ShishirPatil commented Jul 11, 2024

HuanzhiMao commented Jul 11, 2024

devanshamin commented Jul 11, 2024

ShishirPatil commented Jul 17, 2024

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

CharlieJCJ left a comment •

edited

Loading