You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at how to add the git info to the index, something like:
**Source Repository:** [https://github.com/Le09/Tutorial-Codebase-Knowledge/](https://github.com/Le09/Tutorial-Codebase-Knowledge/)
**Commit Hash:** d66d97639092051cd7eb0df82a96bec5a5b6bec4
**Branch Name:** main
So that it could potentially be used as a reference doc.
Also, having git info could be extended to had links to functions, classes, etc.
However, there are 3 different cases:
local git repository
remote repository, via ssh
remote https repository
Case 1 is (mostly) easy; the only issue is that there might be a host alias.
Case 2 is problematic because the project is checked out in a temporary directory that is created within crawl_github_files.
Case 3 uses the API so it may be less of a problem, there's less duplication of work.
Except for case 1, I think it's a flow in the abstraction, since crawl_github_files is an isolated function, but there may be more that you want to extract from git.
Why have this complexity altogether, and not always clone the repository in .cache?
If it's to save on size, it can be done with a depth 1, although the describe tags wouldn't work in that case.
But it's only relevant for large repositories, and the time spent cloning is dwarfed by the time calling the LLM whatever the size may be.
I've made a small commit for the local case: 01b7c28
Do you have an opinion on the matter to make it into a real PR?
The text was updated successfully, but these errors were encountered:
I was looking at how to add the git info to the index, something like:
So that it could potentially be used as a reference doc.
Also, having git info could be extended to had links to functions, classes, etc.
However, there are 3 different cases:
Case 1 is (mostly) easy; the only issue is that there might be a host alias.
Case 2 is problematic because the project is checked out in a temporary directory that is created within
crawl_github_files
.Case 3 uses the API so it may be less of a problem, there's less duplication of work.
Except for case 1, I think it's a flow in the abstraction, since
crawl_github_files
is an isolated function, but there may be more that you want to extract from git.Why have this complexity altogether, and not always clone the repository in
.cache
?If it's to save on size, it can be done with a depth 1, although the describe tags wouldn't work in that case.
But it's only relevant for large repositories, and the time spent cloning is dwarfed by the time calling the LLM whatever the size may be.
I've made a small commit for the local case: 01b7c28
Do you have an opinion on the matter to make it into a real PR?
The text was updated successfully, but these errors were encountered: