-
Notifications
You must be signed in to change notification settings - Fork 917
ChatGPT Support #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The tool needs a |
Ah, I missed those comments. Thanks! Do you have any idea how can I connect my Cursor Pro to your app? I mean how can I route those requests through Cursor? They do not have API feature. |
I'm using LM Studio with "deepseek-r1-distill-qwen-7b" model and enabled its Local Server and this code: ###ByMe
def call_llm(prompt: str, use_cache: bool = True) -> str:
endpoint = os.environ.get("LMSTUDIO_API_ENDPOINT", "http://localhost:1234/v1/chat/completions")
payload = {
"model": "deepseek-r1-distill-qwen-7b", # LM Studio ignores this field, but we must send something
"messages": [{"role": "user", "content": prompt}]
}
max_retries = 5
for attempt in range(max_retries):
try:
response = requests.post(endpoint, json=payload, timeout=10)
response.raise_for_status()
r = response.json()
return r["choices"][0]["message"]["content"]
except requests.exceptions.ConnectionError:
if attempt < max_retries - 1:
print(f"[call_llm] LM Studio not reachable, retrying ({attempt + 1}/{max_retries})...")
time.sleep(2)
else:
raise RuntimeError(f"[call_llm] Error: LM Studio API server is not running at {endpoint}")
except Exception as e:
raise RuntimeError(f"[call_llm] Unexpected error: {e}")
#### but I got: 1 Error predicting: _0x27309f [Error]: Trying to keep the first 9627697 tokens when context the overflows. However, the model is loaded with context length of only 4096 tokens, which is not enough. Try to load the model with a larger context length, or provide a shorter input
at _0x41307e.LLMEngineWrapper.predictTokens (/Applications/LM Studio.app/Contents/Resources/app/.webpack/lib/llmworker.js:80:27505)
at async Object.predictTokens (/Applications/LM Studio.app/Contents/Resources/app/.webpack/lib/llmworker.js:96:12847)
at async Object.handleMessage (/Applications/LM Studio.app/Contents/Resources/app/.webpack/lib/llmworker.js:96:2442) {
cause: undefined,
suggestion: undefined,
errorData: undefined,
data: undefined,
displayData: undefined,
title: 'Trying to keep the first 9627697 tokens when context the overflows. However, the model is loaded with context length of only 4096 tokens, which is not enough. Try to load the model with a larger context length, or provide a shorter input'
} |
Even while I'm counting tokens, it still failed: tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")
MAX_TOKENS = 4000
def trim_prompt(prompt: str) -> str:
tokens = tokenizer.encode(prompt)
if len(tokens) > MAX_TOKENS:
print(f"[call_llm] Warning: prompt too long ({len(tokens)} tokens), trimming to {MAX_TOKENS} tokens...")
tokens = tokens[:MAX_TOKENS]
prompt = tokenizer.decode(tokens)
return prompt
def call_llm(prompt: str, use_cache: bool = True) -> str:
endpoint = os.environ.get("LMSTUDIO_API_ENDPOINT", "http://localhost:1234/v1/chat/completions")
# Trim the prompt if it's too large
prompt = trim_prompt(prompt)
.... Error: [call_llm] Warning: prompt too long (9518473 tokens), trimming to 4000 tokens...
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connectionpool.py", line 534, in _make_request
response = conn.getresponse()
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connection.py", line 516, in getresponse
httplib_response = super().getresponse()
File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/http/client.py", line 1430, in getresponse
response.begin()
~~~~~~~~~~~~~~^^
File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/http/client.py", line 331, in begin
version, status, reason = self._read_status()
~~~~~~~~~~~~~~~~~^^
File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/http/client.py", line 292, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/socket.py", line 719, in readinto
return self._sock.recv_into(b)
~~~~~~~~~~~~~~~~~~~~^^^
TimeoutError: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.13/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
method=request.method,
...<9 lines>...
chunked=chunked,
)
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
)
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
conn,
...<10 lines>...
**response_kw,
)
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connectionpool.py", line 536, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/urllib3/connectionpool.py", line 367, in _raise_timeout
raise ReadTimeoutError(
self, url, f"Read timed out. (read timeout={timeout_value})"
) from err
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Tutorial-Codebase-Knowledge/utils/call_llm.py", line 138, in call_llm
response = requests.post(endpoint, json=payload, timeout=10)
File "/opt/homebrew/lib/python3.13/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/opt/homebrew/lib/python3.13/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/opt/homebrew/lib/python3.13/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/opt/homebrew/lib/python3.13/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Tutorial-Codebase-Knowledge/main.py", line 84, in <module>
main()
~~~~^^
File "Tutorial-Codebase-Knowledge/main.py", line 81, in main
tutorial_flow.run(shared)
~~~~~~~~~~~~~~~~~^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 16, in run
return self._run(shared)
~~~~~~~~~^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 50, in _run
def _run(self,shared): p=self.prep(shared); o=self._orch(shared); return self.post(shared,p,o)
~~~~~~~~~~^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 48, in _orch
while curr: curr.set_params(p); last_action=curr._run(shared); curr=copy.copy(self.get_next_node(curr,last_action))
~~~~~~~~~^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 13, in _run
def _run(self,shared): p=self.prep(shared); e=self._exec(p); return self.post(shared,p,e)
~~~~~~~~~~^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 33, in _exec
if self.cur_retry==self.max_retries-1: return self.exec_fallback(prep_res,e)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 28, in exec_fallback
def exec_fallback(self,prep_res,exc): raise exc
^^^^^^^^^
File "/opt/homebrew/lib/python3.13/site-packages/pocketflow/__init__.py", line 31, in _exec
try: return self.exec(prep_res)
~~~~~~~~~^^^^^^^^^^
File "Tutorial-Codebase-Knowledge/nodes.py", line 149, in exec
response = call_llm(prompt)
File "/Tutorial-Codebase-Knowledge/utils/call_llm.py", line 149, in call_llm
raise RuntimeError(f"[call_llm] Unexpected error: {e}")
RuntimeError: [call_llm] Unexpected error: HTTPConnectionPool(host='localhost', port=1234): Read timed out. (read timeout=10) |
Can you add support for ChatGPT, instead of Gemini?
The text was updated successfully, but these errors were encountered: