Content-Length: 282462 | pFad | https://github.com/assafelovic/gpt-researcher/issues/936#issuecomment-2579364890

5B Azure Embedding Quota Limit · Issue #936 · assafelovic/gpt-researcher · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Embedding Quota Limit #936

Closed
danieldekay opened this issue Oct 21, 2024 · 9 comments
Closed

Azure Embedding Quota Limit #936

danieldekay opened this issue Oct 21, 2024 · 9 comments

Comments

@danieldekay
Copy link
Contributor

Describe the bug
I am running a detailed report with Azure Openai, and am hitting quota limits. While I have a rate limit activated of 500k tokens per minute, it seems to still throw an error and not handle the throttling request well.

openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2024-02-15-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}

Of course the error message should be false, as my rate limits are not per day, and waiting 24h is not an option.

Expected behavior

  • The log to the user (e.g. websocket) should indicate that there is throttling in place.
  • The embedding should resume after a while, or indicate what the user can do instead.
@roninio
Copy link
Contributor

roninio commented Oct 27, 2024

i have the same issue.

@ElishaKay
Copy link
Collaborator

ElishaKay commented Oct 30, 2024

Fair point, we'll have to think how to make this smoother.

A) @danieldekay, is it the same docs that you're running reports on?

Have a look at this PR: "Documents, crawled urls, and website will be chunked and loaded to the inputted vector store if vector_store is not None."

#838

Meaning, if you run GPTR with the same Langchain vectorstore, perhaps it will cut down the embeddings processes.

B) the "cooling off" feature is also a good idea. Did you mention somewhere that there's a Langchain method we can leverage to get the required "cool off" period?

Once we have that, we can go about adding the websocket message. Adding an exception handler block would also be a good first step which publishes a websocket message to the frontend

@danieldekay
Copy link
Contributor Author

@ElishaKay - it's a standard web research report based on Bing.

Langchain has support for a rate limiter:
https://python.langchain.com/docs/how_to/chat_model_rate_limiting/

maybe that is also an option
https://www.perplexity.ai/search/when-i-am-embedding-documents-zjPsfHmgRk.KVOIf4xXaQQ#0

@ElishaKay
Copy link
Collaborator

Awesome.

Adding to the resilience channel on Discord.

For anyone reading who hasn't joined the Discord, Join here to access the above link

@roninio
Copy link
Contributor

roninio commented Oct 31, 2024

I solved the issue by
_embeddings = AzureOpenAIEmbeddings(
model=model,
timeout=60,
chunk_size=1000,
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
openai_api_key=os.environ["AZURE_OPENAI_API_KEY"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
**embdding_kwargs,

and changing to azure_openai:text-embedding-3-large

If you think this is correct solution i can add a pull request
The suggested solution (chat_model_rate_limiting) is not working for AzureOpenAIEmbeddings in the current version of Langhian. https://python.langchain.com/docs/how_to/chat_model_rate_limiting/

@Youbiquitous
Copy link

Has someone solved this?

@ElishaKay
Copy link
Collaborator

ElishaKay commented Nov 9, 2024

Sure @roninio,

Green light for the PR - maybe we should also set a default azure embedding model in the config?

There's a good chance this is also a cause of a problem for the Open_AI API - i.e. that we should upgrade the embedding model.

Sounds like we should edit that file to:

match os.environ["EMBEDDING_PROVIDER"]:
    case "openai":
        self.embedding_model = "text-embedding-3-large"
    case "azure_openai":
        self.embedding_model = "text-embedding-3-large"

@roninio
Copy link
Contributor

roninio commented Nov 11, 2024

Hi created a pull request #979
in the pull request I also updated documentation regarding Azure

@us
Copy link

us commented Jan 9, 2025

still.... I am getting same error on detailed report every time! is there any params or something that we can adjust to not hit? or whats the solution?
currently I have 350k tpm, 2.1k rpm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/assafelovic/gpt-researcher/issues/936#issuecomment-2579364890

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy