Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

Merged
merged 18 commits into from
Jul 19, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

There are inconsistencies between the test_category argument that's used by eval_checker/eval_runner.py and openfunctions_evaluation.py.

This PR partially addresses #501 and #502.

@HuanzhiMao HuanzhiMao marked this pull request as ready for review July 7, 2024 00:07
@HuanzhiMao HuanzhiMao changed the title [BFCL] Standarize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py Jul 7, 2024
Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

@HuanzhiMao
Copy link
Collaborator Author

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

Good point. Will address this in a different PR.

@ShishirPatil
Copy link
Owner

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

@HuanzhiMao
Copy link
Collaborator Author

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

@devanshamin
Copy link
Contributor

I have solved this issue here.

@ShishirPatil
Copy link
Owner

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

I don't think this solves it right? We should present the flags when the user cares about the command, not super down. Folks read and execute code from READMEs linearly, and making them jump around isn't a good idea imo.

@HuanzhiMao HuanzhiMao requested a review from ShishirPatil July 18, 2024 22:52
@ShishirPatil ShishirPatil merged commit a9dd435 into ShishirPatil:main Jul 19, 2024
@HuanzhiMao HuanzhiMao deleted the test-category branch July 21, 2024 05:01
ShishirPatil pushed a commit that referenced this pull request Aug 10, 2024
As mentioned in #506, this PR make the warning messages more informative
for user to know action items when aggregating leaderboard results.

---------

Co-authored-by: CharlieJCJ <charliechengjieji@berkeley.edu>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
…ns_evaluation.py (ShishirPatil#506)

There are inconsistencies between the `test_category` argument that's
used by `eval_checker/eval_runner.py` and `openfunctions_evaluation.py`.

This PR partially addresses ShishirPatil#501 and ShishirPatil#502.

---------

Co-authored-by: Shishir Patil <30296397+ShishirPatil@users.noreply.github.com>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
…#517)

As mentioned in ShishirPatil#506, this PR make the warning messages more informative
for user to know action items when aggregating leaderboard results.

---------

Co-authored-by: CharlieJCJ <charliechengjieji@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy